[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Resource Matching



Hi Greg,

It seems I do not understand core allocation & usage.

When I am allocated 8 cores for a single job on a dual socket, 64 core (32+32) machine for example, what exactly happens next? 

  • Does this mean I am just allocated 1/8th of the possible CPU compute cycles on the node, regardless of which core is doing the computation? That job just has a maximum average load allowed of 8. Which core does what task being handled by the OS, presumably then.

~ Matt

P.S. - References that cover these topics are greatly appreciated so I need not pester the list with what I feel like are very basic questions.

Matthew T. West
DevOps & HPC SysAdmin
University of Exeter, Research IT
57 Laver Building, North Park Road, Exeter, EX4 4QE, United Kingdom
On 04/04/2024 17:28, Greg Thain via HTCondor-users wrote:
CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.


On 4/4/24 10:54, Matthew T West via HTCondor-users wrote:
Let me try again, at least regarding CPUs & chiplets.

When requesting multiple CPU cores for a single job, can I specify
that all the cores come from a single NUMA node or single socket, when
the EP is set-up to dynamically allocate slots? Maybe that is the
default but I cannot find any info in the docs.


Hi Matt:

There's no good or easy way to do this today in HTCondor.  The startd
can affinity-lock a job to a cpu-core or a number of cores, but there is
no automatic way in condor when requesting more than one core to
affiinity-lock into some specific geometry.  Part of the problem is one
of naming.  The affinity APIs in the kernel speak in terms of numeric
core-ids, but there is no standard for how the number assigned to a
core-id relates to it's NUMA (or hyperthread) geometry.

Now, there are hack-arounds (there always are!), wherein if you are
willing to forbid jobs from ever running across zones, you can configure
a startd or a p-slot in a startd to be dedicated to a particular subset
of the job, and use ASSIGN_CPU_AFFINITY to lock that startd or p-slot to
that subsection of the cpus on the system.

Personally, my strongly-held, but unfulfilled opinion, is that this is
all the responsibility of the OS kernel, and *it* should figure out
which processes belong together, and schedule them appropriately.  But
perhaps that is naive.

-greg