Subject: Re: [HTCondor-users] HTCondor/cgroups: limiting CPUs/pinning processes to CPUs with hyperthreaded CPUs
From: Thomas Hartmann <thomas.hartmann@xxxxxxx> Date: 02/04/2016 10:08 AM
> If a I understand the cgroup documentation correctly, a cgroup cannot
be
> limited to a "general number of cores" but can only be pinned
to certain
> cores. I.e., limiting the number of cores for a cgroup means to pin
the
> cgroup to as many dedicated cores on the system, or?
> So, I guess the startd pins a job with core limit correspondingly
in a
> cgroup to cores, or?
>
> Is this actually a drawback, that processes cannot be switched between
> cores (does the CPU would move anyway processes between cores?)?
> How does actually a hyperthreaded system would look like - if a process
> is pinned to "a hyperthreaded core", I guess the process
would be moved
> over the physical cores by the CPU, or?
With the default cgroup setup, the startd does not
pin jobs to specific processors, but instead uses the cpu.shares functionality.
The share assigned to a job is the number of requested cpus
times 100, so a single core job gets 100, two-core gets 200, and so on.
This limit is only applied when there is contention
for CPU time, however, so if a job wants to use 8 cores but only requested
one, it can use eight only as long as there's idle capacity on other cores,
but if the machine fills up it will be dialed back to its cpu.shares
value of 100, on a single core.
This has been important for compiled MATLAB jobs -
unless the MATLAB code specifies a maximum compute thread count or has the
singleCompThread command-line option, MATLAB will use all available
cores, which is a bummer if your machine has a lot of cores and is also
trying to run such a MATLAB job on each of them. The cpu.shares
doesn't require a specific constraint in the MATLAB code, which means
it will run full bore on the user's desktop, and full bore on an underutilized
exec node, but won't step on everything else on a busy
exec node.
If you do want affinity, for processor cache coherence
considerations or the like, you can do that too, though.
There's a knob called "ENFORCE_CPU_AFFINITY"
which causes each job and all its children to stay on a specific core, and "ASSIGN_CPU_AFFINITY" which enables affinity to work with dynamic slots
and overrides the ENFORCE setting.