Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Problems with cpu accounting
- Date: Sat, 10 Jan 2026 11:59:45 +0100
- From: Jeff Templon <templon@xxxxxxxxx>
- Subject: Re: [HTCondor-users] Problems with cpu accounting
Hi Cole,
Itâs a bug, I just canât figure out which kind :-)
Bug 1) it is literally doing what the doc says, meaning that if an 8-core job uses two cores effectively and the other 6 empty, 100% is reported since the actual answer is 200% (of the desired 800%)
Bug 2) is that you left out a cpu-normalisation in your explanation, meaning that if an 8-core job uses two cores effectively and the other 6 empty, 25% will be reported; the bug is that 95.7% should be reported, but 100% is reported.
Which of the two bugs is it, and what is the perspective for a fix?
JT
> On 9 Jan 2026, at 17:34, Cole Bollig via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
>
> Hi Jeff,
>
> It appears that our documentation for CPU_UTIL print format is misleading. Digging into the code CPU_UTIL is a percentage, so the code is doing(RemoteUserCpu / CommittedTime) * 100 with a max of 100 and min of 0.
>
> Cheers,
> Cole BolligFrom: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Jeff Templon <templon@xxxxxxxxx>
> Sent: Friday, January 9, 2026 7:52 AM
> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Subject: [HTCondor-users] Problems with cpu accounting
> Hi,
>
> We have three measures of cpu usage, all three giving a different result:
>
> CPU_UTIL from the print format -> 100%
> CpusUsage as a job attribute -> 3.169
> RemoteUserCpu / CommittedTime -> 3.827
>
> See the command output under the quoted message excerpt below.
>
> RemoteUserCpu / CommittedTime is what the documentation claims will be printed by CPU_UTIL. Itâs not. 3.827 is what Iâd qualitatively expect from this workload.
>
> What is going on????
>
> JT
>
>
>> On 9 Jan 2026, at 13:47, Emily Kooistra <a66@xxxxxxxxx> wrote:
>>
>> So that CPutil is,
>>
>> RemoteUserCpu / CommittedTime
>>
>> So i guess print both and see?
>>
>
> Here:
>
> â> condor_history -completedsince $(date -d "64 minutes ago" +"%s") -print-format /user/templon/yafu_htcondor/cputests.cpf -wide:164 -constraint 'Owner=="templon"'
> JOB_ID Username Class CMD Finished Started CPUS CPuse RemUsCpu CommTime CPutil MEMREQ MEM ST WALL_TIME NStrt WorkerNode
> 3823143.0 templon long yafu-b637.condor b 1/9 14:27 1/9 13:52 4 3.169 2:12:51 34:43 100.0 128.0 GB 366.2 MB C 34:43 1 wn-pijl-005
>
> So CPutil lies, because RemUsCpu / CommTime = 3.827 while CPutil is100% and CPuse (CpusUsage) is 3.169
>
> JT
>
> Note: the cpf file is
>
> $ cat cputests.cpf
> SELECT NOSUMMARY
> ClusterId AS JOB_ID PRINTAS JOB_ID WIDTH -11
> Owner AS 'Username'
> JobCategory AS Class WIDTH 5
> join(" ",split(Cmd,"/")[size(split(Cmd,"/"))-1], Args) AS ' CMD' WIDTH -18
> CompletionDate AS ' Finished ' PRINTAS DATE
> JobCurrentStartDate AS ' Started' PRINTAS QDATE
> CpusProvisioned AS 'CPUS' WIDTH 3
> CpusUsage AS 'CPuse' WIDTH 5
> interval(RemoteUserCpu) AS " RemUsCpu" WIDTH 10
> interval(CommittedTime) AS " CommTime" WIDHT 10
> Dummy AS 'CPutil' PRINTAS CPU_UTIL
> MemoryProvisioned AS ' MEMREQ' PRINTAS READABLE_MB
> # ResidentSetSize AS ' RAM' PRINTAS READABLE_KB WIDTH 8
> ImageSize AS ' MEM' PRINTAS READABLE_KB WIDTH 10
> JobStatus AS "ST" PRINTAS JOB_STATUS WIDTH 3
> interval(RemoteWallClockTime) AS " WALL_TIME" WIDTH 10
> JobRunCount AS 'NStrt' WIDTH 5
> split(splitSlotName(LastRemoteHost)[1], ".")[0] AS âWorkerNode"
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
>
> The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/