Hi Greg,
First, I didn't realize when one uses pslots, an empty machine
has just 1 even if it's a multi-socket box. Learned something new
there.
Second, I did not realize that cgroup cpu.shares was a minimum
resource constraint (on linux, startd as root, etc) and that a
process could go beyond that if the was not busy, for some
definition of "busy". Just doubled checked on the control group
Linux manpage.
And yes, I can see how a particular bit-mask could solve the
problem this specific instance but fails when one has a highly
varied set of workloads running on a system. In this case, makes
the most sense to either leave it to the kernel OR have the
application itself explicitly ask to be pinned. Also seems like
some improvements in kernel scheduling are coming:
https://www.phoronix.com/news/Linux-Completing-EEVDF-Sched
So yeah, I think I follow everything, so thank you. Admittedly my
education on this topic has been a bit like drinking from a fire
hose.
Cheers,
Matt
Matthew T. West
DevOps & HPC SysAdmin
University of Exeter, Research IT
57 Laver Building, North Park Road, Exeter, EX4 4QE, United Kingdom
On 05/04/2024 01:07, Greg Thain via
HTCondor-users wrote:
CAUTION:
This email originated from outside of the organisation. Do not
click links or open attachments unless you recognise the sender
and know the content is safe.
On 4/4/24 5:42 PM, Matthew T West
via HTCondor-users wrote:
Hi Greg,
It seems I do not understand core allocation & usage.
When I am allocated 8 cores for a single job on a dual
socket, 64 core (32+32) machine for example, what exactly
happens next?
Well .... it depends. assuming a condor startd with one
partitionable slot with all 64 cores in it...
At the very least, the allocation for 8 of these cores means
that there are 48 left for condor to allocate to subsequent
jobs. This running job is free to spawn as many processes or
threads as it likes. BUT, if the startd has rootly
privilleges (and we're running on Linux), by default, we will
put the job in a cgroup, with cgroup cpu.shares proportional
to the number of cores allocated to it. In this case, if the
job spawns 64 processes, and there is nothing else running on
the system, the kernel will schedule them on all 64 of the
cores. If there is contention, though, for all the cpu
processors, the cgroup cpu-shares mechanism means that this
job will be limited to using only 8 of the 64 cores on the
system, no matter how many runnable processes or threads is
spawn. However, *which* of those 8 cores the job gets to use
is up to the kernel. Unfortunately, I'm led to believe that
the kernel may not do as optimal a job as picking which of
these cores to run on as we might like.
Optionally, an administrator can set ASSIGN_CPU_AFFINITY in a
pslot, and condor will assign a cpu affinity bit-mask to the
processes in the job. In this case, even if the machine is
otherwise idle, our job with an 8 core allocation can only use
the specific 8 cores that condor assigns to it. However,
HTCondor assigns these in a strictly sequential numerical
order, which might not be optimal for any multicore geometry.
Does this make sense? I might use this an excuse to enhance
the online manual a bit.
-greg
- Does this mean I am just allocated 1/8th of the possible
CPU compute cycles on the node, regardless of which core
is doing the computation? That job just has a maximum
average load allowed of 8. Which core does what task being
handled by the OS, presumably then.
~ Matt
P.S. - References that cover these topics are greatly
appreciated so I need not pester the list with what I feel
like are very basic questions.
Matthew T. West
DevOps & HPC SysAdmin
University of Exeter, Research IT
57 Laver Building, North Park Road, Exeter, EX4 4QE, United Kingdom
On 04/04/2024 17:28, Greg Thain
via HTCondor-users wrote:
CAUTION: This email originated from outside of the
organisation. Do not click links or open attachments unless
you recognise the sender and know the content is safe.
On 4/4/24 10:54, Matthew T West via HTCondor-users wrote:
Let me try again, at least regarding
CPUs & chiplets.
When requesting multiple CPU cores for a single job, can I
specify
that all the cores come from a single NUMA node or single
socket, when
the EP is set-up to dynamically allocate slots? Maybe that
is the
default but I cannot find any info in the docs.
Hi Matt:
There's no good or easy way to do this today in HTCondor.
The startd
can affinity-lock a job to a cpu-core or a number of cores,
but there is
no automatic way in condor when requesting more than one
core to
affiinity-lock into some specific geometry. Part of the
problem is one
of naming. The affinity APIs in the kernel speak in terms
of numeric
core-ids, but there is no standard for how the number
assigned to a
core-id relates to it's NUMA (or hyperthread) geometry.
Now, there are hack-arounds (there always are!), wherein if
you are
willing to forbid jobs from ever running across zones, you
can configure
a startd or a p-slot in a startd to be dedicated to a
particular subset
of the job, and use ASSIGN_CPU_AFFINITY to lock that startd
or p-slot to
that subsection of the cpus on the system.
Personally, my strongly-held, but unfulfilled opinion, is
that this is
all the responsibility of the OS kernel, and *it* should
figure out
which processes belong together, and schedule them
appropriately. But
perhaps that is naive.
-greg
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
|