Subject: Re: [HTCondor-users] wall clock time in condor_q
> From: Brian Bockelman <bbockelm@xxxxxxxxxxx> >
> Hi Michael, >
> Unfortunately, I donât think itâs possible to do a
> âRemoteCpuUtilizationPercentâ attribute â at least, I failed miserably
at
> doing this last time I tried (I suppose there could be new attributes?). >
> Brian
Here's a version I've been using in
one of my pools:
This is an earlier slightly crummy version,
because when I went to bring it into another pool I realized that in order
to get historical averages, rather than merely identifying errored-out
MATLAB workers waiting for input from /dev/null with their utilization
percentage dwindling towards zero due to insufficiently educated users
of the "-r" option, I needed to have the value defined after
the job finished running.
You'll note that it's quite quick-and-dirty,
since it doesn't take suspension time into account (because this pool doesn't
suspend jobs), and it doesn't work on checkpointed restarted standard universe
jobs either. The way I read the manual, RemoteSysCpu and RemoteUserCpu
count goodput run time, and go to zero following an uncheckpointed eviction,
or in other words, every eviction in the vanilla universe, so the time
scales match in vanilla but they wouldn't match in standard. But this pool
doesn't run standard universe jobs anyway.
I don't have the newer version for historical
figures from the other pool on hand at the moment, I'll check in with someone
who has access to it and send another message. It basically just switches
from using CurrentTime to CompletionDate depending on the JobStatus. That
version really gave us what we needed - they put in a funding proposal
to triple the number of GPUs in each compute node based on the data.
Michael V. Pelletier
IT Program Execution
Principal Engineer
978.858.9681 (5-9681) NOTE NEW NUMBER
339.293.9149 cell
339.645.8614 fax
michael.v.pelletier@xxxxxxxxxxxx