Hi all,
a short question regarding jobs core time scaling via cgroup cpu.shares:
The relative share of a job's cgroup is only limiting with respect to
the total core-scaled CPU time, or?
I.e., we are running our nodes with hyperthreading 2x enabled for
simplicity, since we use the same machines for production jobs as well
as for user job sub-clusters.
Since user have occasionally odd user jobs (that tend to work better
without overbooking) we broker on user nodes only 1/2 of the HT-core
numbers for jobs.
now, the condor parent cgroup has assigned
 Âhtcondor/cpu.shares = 1024
with respect to the total system share of
 Âcpu.shares=1024
so all condor child processes (without further sub-groups) could in
principle use up to 100% of the total HT-core scaled CPU time.
A single core job gets a relative share like
htcondor/condor_var_lib_condor_execute_slot2_15@xxxxxxxxxxxxxxx/cpu.shares
100
where we broker only 50% of the total HT-core scaled time - as far as I see.
However, user jobs can utilize more than their nominally assigned cpu share.
My understanding is, that the kernel notices, that the total CPU time is
not utilized completely - and thus allows processes to use more than
their nominal time limit as there is still CPU time available.
Is this correct? ð
When we scale the condor parent cgroup to a reasonable fraction of the
system cpu.share (taking HT efficiency into account), we should be able
to scale CPU times per job to (roughly) core-equivalents - without the
need to bind jobs to specific cores, or?
Cheers,
 ÂThomas
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/