John,
We have noticed a problem in collecting accounting data from the HTCondor
classads. We are seeing situations where CPU is exceeding Wall time.
We use the RemoteWallClockTime classad as the basis of Wall time. According
to the documentation, this appears to be the correct one to use. The accounting
system also captures CommittedTime. We are seeing conditions where
CommittedTime exceeds RemoteWallClockTime. One of many cases....
CommittedTime = 3944 RemoteWallClockTime = 1 Total CPU = 1935
Based on the documentation, if I am interpreting it correctly, CommittedTime
should never exceed RemoteWallClockTime since CommittedTime can get reset to
zero if evicted w/o a checkpoint. And RemoteWallClockTime does not.
I am trying to understand under what conditions this can occur.
It is making no sense to us.
Is this happening while the jobs are actively running? Because the
RemoteWallClockTime returned from condor_q is only accurate when the
job is not running.
I have jobs running now with multiple hours of CommittedTime, but with
RemoteWallClockTime still zero. If evicted, the RemoteWallClockTime
is updated.