From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Cc: Hermann Fuchs <hermann.fuchs@xxxxxxxxxxxxxxxx>; Bob Briscoe <paw_deer@xxxxxxxxxxx>
Sent: Thursday, 22 March 2012, 17:47
Subject: Re: [Condor-users] Monitoring the load of a job
On 3/22/2012 7:28 AM, Hermann Fuchs wrote:
> Hi
>
> We usually use a combination of
> condor_q -run to see which jobs belongs to which slot
> and condor_status to see the load on this slot.
> However, I do not know how to combine this information.
>
> Best regards,
> Hermann
Couple
quick thoughts on this :
Be aware that simply entering
condor_status -run
will display all busy slots along with the load average and the submitting user name (and the machine they submitted from).
It is always useful to take a peek at all the machine attributes (condor_status -l) and/or job attributes (condor_q -l) and see what is there. Doing this note that for any slot that is claimed/busy, the JobId of the job running on that slot appears in the machine ad. (along with other info like the GlobalJobId, RemoteUser, etc).
So to see the load average generated by a particular job, for instance job "92.0", you could do this:
condor_status -cons 'JobId=="92.0"'
The above will get the load info from the central manager, which is updated only periodically, so the load average may be a couple minutes stale. (which likely doesn't matter much, since load avg is averaged over a period of time
anyhow).
If you want up-to-the-second load info, you could use the "-direct" argument in condor_status to go directly to the execute node instead of using the cached info in the central manager. To do this we can use the back-tick geekness of the shell to state which node to directly query via another invocation of condor_status like so:
condor_status -direct `condor_status -con 'JobId=="92.0"' -format "%s\n" Name`
Hope the above is helpful and not overly geeky,
Todd
> On Thu, 2012-03-22 at 11:35 +0000, Bob Briscoe wrote:
>> Hi,
>> Can one monitor the load generated by a particular job as it's running? I ask because occassionaly a job may claim a slot, be in running state, but actually be sitting idle as it's expecting some input to be sent to it from some other machine (e.g. could be a case of deadlock). In such a case it would be useful to see that slot's load. I know that condor_status
publishes the loads of slots, but it often gets its mappings wrong, so unclaimed states are reported under load whereas working slots are shown to be un-loaded. Also, we'd like to do this via condor_q or some similar command which would specify the job id or user id.
>> TIA,
>> Bob
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>>
https://lists.cs.wisc.edu/mailman/listinfo/condor-users>>
>> The archives can be found at:
>>
https://lists.cs.wisc.edu/archive/condor-users/>
-- Todd Tannenbaum <
tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
Condor Project Technical Lead 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685