[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job CPU usage updates



On Tue, Sep 24, 2013 at 1:26 AM, Wilkins, David
<David.Wilkins@xxxxxxxxxxxxxx> wrote:
> Is there anything in the job ClassAds that would tell us that the update has
> occurred?

You could use the job's CommittedTime as a check. If the CommittedTime
is greater than STARTER_UPDATE_INTERVAL (and I'd add a little extra
padding to be sure), then the job may be in a hung state like you
describe. The caveat here is that if you ever increase
STARTER_UPDATE_INTERVAL, you'll want to make sure to adjust your
submit file accordingly.

Here's the description of CommittedTime from Appendix A of the manual.

CommittedTime:
The number of seconds of wall clock time that the job has been
allocated a machine, excluding the time spent on run attempts that
were evicted without a checkpoint. Like RemoteWallClockTime, this
includes time the job spent in a suspended state, so the total
committed wall time spent running is

CommittedTime - CommittedSuspensionTime




-- 
Ben Cotton
Senior Support Engineer
Cycle Computing, LLC
The Leader in Utility Supercomputing Software