Hi all,I am currently wondering about a few nodes, that have a utilization of all (HT) cores but should only be using only 50%, i.e., just the physical core count.
The nodes have AMD Epycs with HT/SMT cores active - but since we have COUNT_HYPERTHREAD_CPUS = false set, Condor should be using only 50% of the (virtual) core count [1], or?.What worries me a bit is, that the CPU time shares of the jobs look good [2], i.e., currently just <48 single core jobs with a relative '100' weight. However, I am not sure anymore, how the kernel is distributing the CPU time slots here, if the parent relative share is 100%(?) of the overall(??) time share?
Is the CPU time weighting maybe misleading here, if one tries to 'match' only for the physical core count?
Cheers and thanks for ideas, Thomas [1] COUNT_HYPERTHREAD_CPUS = false ... DETECTED_CORES = 96 DETECTED_CPUS = 48 DETECTED_MEMORY = 257656 DETECTED_PHYSICAL_CPUS = 48 .. NUM_CPUS = $(DETECTED_CPUS) [2] [root@batch1071 htcondor]# cat /sys/fs/cgroup/cpu,cpuacct/cpu.shares 1024[root@batch1071 htcondor]# cat /sys/fs/cgroup/cpu,cpuacct/htcondor/cpu.shares
1024[root@batch1071 htcondor]# cat /sys/fs/cgroup/cpu,cpuacct/htcondor/condor_var_lib_condor_execute_slot*/cpu.shares | sort | wc -l
45
Attachment:
batch1071_load_6h_20210115.png
Description: PNG image
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature