[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] CPUsUsage versus computed values



Hi all,

We are currently putting an extra close eye on CPU usage and Iâm a bit confused by the options available (letâs not delve into what is âtheâ CPU usage). Iâm using both the inbuilt CPUsUsage (via ProcFamily CgroupV1) and computed expressions for running [1] and completed [2] jobs. Of course they donât quite agree so Iâm interested if Iâm doing it right and if anyone has better suggestions.

For running jobs the CPUsUsage is consistently higher than the computed value but rather close (e.g. 9.02 vs 8.7).
- The docs for CPUsUsage say itâs the one-minute CPU usage. However, if Iâm reading the code [3] right only the total cpu metrics for the entire job cgroup are collected and used. So is CPUsUsage over a specific time range or the entire lifetime of the job?
- Is my expression basically replicating what CPUsUsage is doing and just limited by timing resolution?

For completed jobs the CPUsUsage is sometimes sensible (e.g. 7.86 vs 7.78) but oftentimes completely bogus (e.g. 0.11 vs 7.22).
- Is the CPUsUsage actually meaningful in the history?
- Can we somehow record the peak or average CPUsUsage in history?

Cheers,
Max

[1] expression for condor_q -run
'(RemoteSysCPU + RemoteUserCpu) / (ServerTime - JobCurrentStartDate)'

[2] expression for condor_history
'(CumulativeRemoteSysCpu + CumulativeRemoteUserCpu) / (RemoteWallClockTime - CumulativeSuspensionTime)â

[3] ProcFamilyDirectCgroupV1::get_usage
https://github.com/htcondor/htcondor/blob/66aadf0278a07ee219eaa184068403c7dee1db4d/src/condor_utils/proc_family_direct_cgroup_v1.cpp#L292-L338

Attachment: smime.p7s
Description: S/MIME cryptographic signature