Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Monitoring the load of a job
- Date: Thu, 22 Mar 2012 12:47:28 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [Condor-users] Monitoring the load of a job
On 3/22/2012 7:28 AM, Hermann Fuchs wrote:
Hi
We usually use a combination of
condor_q -run to see which jobs belongs to which slot
and condor_status to see the load on this slot.
However, I do not know how to combine this information.
Best regards,
Hermann
Couple quick thoughts on this :
Be aware that simply entering
condor_status -run
will display all busy slots along with the load average and the
submitting user name (and the machine they submitted from).
It is always useful to take a peek at all the machine attributes
(condor_status -l) and/or job attributes (condor_q -l) and see what is
there. Doing this note that for any slot that is claimed/busy, the JobId
of the job running on that slot appears in the machine ad. (along with
other info like the GlobalJobId, RemoteUser, etc).
So to see the load average generated by a particular job, for instance
job "92.0", you could do this:
condor_status -cons 'JobId=="92.0"'
The above will get the load info from the central manager, which is
updated only periodically, so the load average may be a couple minutes
stale. (which likely doesn't matter much, since load avg is averaged
over a period of time anyhow).
If you want up-to-the-second load info, you could use the "-direct"
argument in condor_status to go directly to the execute node instead of
using the cached info in the central manager. To do this we can use the
back-tick geekness of the shell to state which node to directly query
via another invocation of condor_status like so:
condor_status -direct `condor_status -con 'JobId=="92.0"' -format
"%s\n" Name`
Hope the above is helpful and not overly geeky,
Todd
On Thu, 2012-03-22 at 11:35 +0000, Bob Briscoe wrote:
Hi,
Can one monitor the load generated by a particular job as it's running? I ask because occassionaly a job may claim a slot, be in running state, but actually be sitting idle as it's expecting some input to be sent to it from some other machine (e.g. could be a case of deadlock). In such a case it would be useful to see that slot's load. I know that condor_status publishes the loads of slots, but it often gets its mappings wrong, so unclaimed states are reported under load whereas working slots are shown to be un-loaded. Also, we'd like to do this via condor_q or some similar command which would specify the job id or user id.
TIA,
Bob
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
Condor Project Technical Lead 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685