Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Resource Matching

Date: Thu, 4 Apr 2024 23:42:51 +0100
From: Matthew T West <m.t.west@xxxxxxxxxxxx>
Subject: Re: [HTCondor-users] Resource Matching

Hi Greg,

It seems I do not understand core allocation & usage.

When I am allocated 8 cores for a single job on a dual socket, 64 core (32+32) machine for example, what exactly happens next?

Does this mean I am just allocated 1/8th of the possible CPU compute cycles on the node, regardless of which core is doing the computation? That job just has a maximum average load allowed of 8. Which core does what task being handled by the OS, presumably then.

~ Matt

P.S. - References that cover these topics are greatly appreciated so I need not pester the list with what I feel like are very basic questions.

Matthew T. West
DevOps & HPC SysAdmin
University of Exeter, Research IT
57 Laver Building, North Park Road, Exeter, EX4 4QE, United Kingdom

On 04/04/2024 17:28, Greg Thain via HTCondor-users wrote:

CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.

On 4/4/24 10:54, Matthew T West via HTCondor-users wrote:

Let me try again, at least regarding CPUs & chiplets.

When requesting multiple CPU cores for a single job, can I specify
that all the cores come from a single NUMA node or single socket, when
the EP is set-up to dynamically allocate slots? Maybe that is the
default but I cannot find any info in the docs.

Hi Matt:

There's no good or easy way to do this today in HTCondor. The startd
can affinity-lock a job to a cpu-core or a number of cores, but there is
no automatic way in condor when requesting more than one core to
affiinity-lock into some specific geometry. Part of the problem is one
of naming. The affinity APIs in the kernel speak in terms of numeric
core-ids, but there is no standard for how the number assigned to a
core-id relates to it's NUMA (or hyperthread) geometry.

Now, there are hack-arounds (there always are!), wherein if you are
willing to forbid jobs from ever running across zones, you can configure
a startd or a p-slot in a startd to be dedicated to a particular subset
of the job, and use ASSIGN_CPU_AFFINITY to lock that startd or p-slot to
that subsection of the cpus on the system.

Personally, my strongly-held, but unfulfilled opinion, is that this is
all the responsibility of the OS kernel, and *it* should figure out
which processes belong together, and schedule them appropriately. But
perhaps that is naive.

-greg

Follow-Ups:
- Re: [HTCondor-users] Resource Matching
  - From: Greg Thain

References:
- [HTCondor-users] Compute topology(?) questions and resource matching
  - From: Matthew T West
- Re: [HTCondor-users] Resource Matching
  - From: Matthew T West
- Re: [HTCondor-users] Resource Matching
  - From: Greg Thain

Prev by Date: Re: [HTCondor-users] Condor couldn't create token on Alma 9 machines
Next by Date: Re: [HTCondor-users] Resource Matching
Previous by thread: Re: [HTCondor-users] Resource Matching
Next by thread: Re: [HTCondor-users] Resource Matching
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Resource Matching