Hi Thomas,On 13 Jun 2024, at 14:46, Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote: Yes, we have cores unallocated. However, my job is really extremely CPU bound, itâs a golang executable that is finding prime factors of numbers, there is really nothing else to use any CPU. I looked further into it. Here is some example information from a recent job: This is what /bin/time says: ./factorize -M 193 74671.89 user 144.01 system 100% cpu 20:43:27 total â ââcondor_starter -f -local-name slot_type_1 -a slot1_27 taai-007.nikhef.nl â â ââstarter â â ââappinit â â â ââferm.condor /user/templon/gofact_er_ng/ferm.condor 842 â â â â ââtime -f %C %U user %S system %P cpu %E total ./factorize -F 842 â â â â ââfactorize -F 842 â â â â ââ2*[{factorize}] â â â ââ6*[{appinit}] In other words, when CPU utilisation of the job (for me, the job means the thing specified in Executable, which is the ferm.condor above) is close to 100%, it can no longer be really trusted. For a job with CPU utilisation of say 80%, the difference doesnât matter so much. Iâm used to Torque which has a much more unitary approach to CPU and Wall time accounting. Iâm realising as well, just now as I write this, that with Torque, our jobs are bare metal. So the executable is the executable. On our HTCondor system, everything is running in a container - possibly the difference is in whatever the container is doing during the job, i.e. dealing with file I/O passing the container boundaries. JT Since Condor uses cpu weights but not pinning by default, jobs get a relative share of the overall core weighted CPU time (but not *a* core per se). So, if there are cpu cycles free, a process with a relative CPU time share can be more cycle time than nominally assigned. So my guess would be that you have either not all cores as slots or that some jobs are idling around for some time and that other jobs can jump onto their cycles. |