Hi Greg,
It seems I do not understand core allocation & usage.
When I am allocated 8 cores for a single job on a dual socket, 64
core (32+32) machine for example, what exactly happens next?
- Does this mean I am just allocated 1/8th of the possible CPU
compute cycles on the node, regardless of which core is doing
the computation? That job just has a maximum average load
allowed of 8. Which core does what task being handled by the OS,
presumably then.
~ Matt
P.S. - References that cover these topics are greatly appreciated
so I need not pester the list with what I feel like are very basic
questions.
Matthew T. West
DevOps & HPC SysAdmin
University of Exeter, Research IT
57 Laver Building, North Park Road, Exeter, EX4 4QE, United Kingdom
On 04/04/2024 17:28, Greg Thain via
HTCondor-users wrote:
CAUTION:
This email originated from outside of the organisation. Do not
click links or open attachments unless you recognise the sender
and know the content is safe.
On 4/4/24 10:54, Matthew T West via HTCondor-users wrote:
Let me try again, at least regarding CPUs
& chiplets.
When requesting multiple CPU cores for a single job, can I
specify
that all the cores come from a single NUMA node or single
socket, when
the EP is set-up to dynamically allocate slots? Maybe that is
the
default but I cannot find any info in the docs.
Hi Matt:
There's no good or easy way to do this today in HTCondor. The
startd
can affinity-lock a job to a cpu-core or a number of cores, but
there is
no automatic way in condor when requesting more than one core to
affiinity-lock into some specific geometry. Part of the problem
is one
of naming. The affinity APIs in the kernel speak in terms of
numeric
core-ids, but there is no standard for how the number assigned to
a
core-id relates to it's NUMA (or hyperthread) geometry.
Now, there are hack-arounds (there always are!), wherein if you
are
willing to forbid jobs from ever running across zones, you can
configure
a startd or a p-slot in a startd to be dedicated to a particular
subset
of the job, and use ASSIGN_CPU_AFFINITY to lock that startd or
p-slot to
that subsection of the cpus on the system.
Personally, my strongly-held, but unfulfilled opinion, is that
this is
all the responsibility of the OS kernel, and *it* should figure
out
which processes belong together, and schedule them appropriately.
But
perhaps that is naive.
-greg
|