[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Print formats and job cpu statistics : the deeper I look, the worse it gets



Hi Jeff,

Two things:


Cheers,
Cole Bollig

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Jeff Templon <templon@xxxxxxxxx>
Sent: Tuesday, June 17, 2025 6:18 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Print formats and job cpu statistics : the deeper I look, the worse it gets
 
Hi,

Thanks for the clarification!

This is a good one for me to start contributing to the documentation, I think … remind me which GitHub repo I should clone so I can contribute back on this one?

JT


On 16 Jun 2025, at 16:48, John M Knoeller via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:

PRINTAS CPU_TIME  is specific to condor_q and tied to the command line argument

     -cputime  Display CPU_TIME instead of RUN_TIME

So yes, it is not a formatting option, it is a data generation option AND a formatting option. 

What you want is 

   PRINTF %T

which is the formatting option that PRINTAS CPU_TIME does after it has generated the time value using a bunch of different
attributes and the -cputime command line option. 

-tj



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Jeff Templon <templon@xxxxxxxxx>
Sent: Saturday, June 14, 2025 3:58 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Print formats and job cpu statistics : the deeper I look, the worse it gets

Hi,

What’s going on here??

$ condor_q -nobatch -pr $HOME/nan3.cpf  -allusers | grep -E "CPU|fgi"                                                                                                                      JOB_ID      User CPUS     MEMREQ        MEMUSE     ST     funCPU       intCPU        fWALL      intWALL      SlotTime Num_Starts   WorkerNode
2115385.0   fgit 64        16.0 GB      7.2 GB      R        0+19:52:05 40+15:55:39   0+19:52:05   2+17:16:13 19:52:05        2 wn-pijl-005.

Print format file:

$ cat nan3.cpf                                                                                                                                                                             SELECT NOSUMMARY
   ClusterId                     AS  JOB_ID  PRINTAS JOB_ID
   Owner                         AS User  WIDTH 4
   CpusProvisioned               AS CPUS
   MemoryProvisioned             AS "    MEMREQ  "   PRINTAS READABLE_MB
   MemoryUsage                   AS "     MEMUSE"    PRINTAS READABLE_MB
   JobStatus                     AS "    ST"         PRINTAS JOB_STATUS
   RemoteUserCpu                 AS "    funCPU"   PRINTAS CPU_TIME
   interval(RemoteUserCpu)       AS "    intCPU"
   RemoteWallClockTime           AS "      fWALL"   PRINTAS CPU_TIME
   interval(RemoteWallClockTime) AS "    intWALL"   WIDTH 12
   interval(ServerTime-JobCurrentStartDate) AS "    SlotTime"
   JobRunCount                    AS 'Num_Starts' WIDTH 4
   splitSlotName(RemoteHost)[1] OR "-" AS "  WorkerNode"  WIDTH -12

1. It looks like PRINTAS CPU_TIME is not PRINTAS, it’s PRINT … it doesn’t matter what time attribute I specify, I get the same output value.  Not clear what kind of bug it is - maybe this is the intent, but then it does not belong among PRINTAS statements, see READABLE_MB above - that one is used twice, and the output value is clearly depends on the given attribute value.

2. The thing I am calling SlotTime is correct, because I know when this job most recently started, 19+ hours is correct. RemoteWallClockTime may be correct, but using the recommended recipe to to get the most current run’s WallClockTime is failing:  CommittedTime is zero (as are CommittedSlotTime and CumulativeSuspensionTime) for this job.  LastRemoteWallClockTime is identical to RemoteWallClockTime.

3. I recommend a cleanup of the attributes at some point. RemoteUserCpu is only for the current run, but RemoteWallClockTime is cumulative.  Maybe this confusion is responsible for the errors described above, either in my understanding or in the code itself.

Have a good weekend,

JT

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

Join us in June at Throughput Computing 25: https://osg-htc.org/htc25

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/