[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Resource Matching



Hi Greg,

First, I didn't realize when one uses pslots, an empty machine has just 1 even if it's a multi-socket box. Learned something new there.

Second, I did not realize that cgroup cpu.shares was a minimum resource constraint (on linux, startd as root, etc) and that a process could go beyond that if the was not busy, for some definition of "busy". Just doubled checked on the control group Linux manpage.

And yes, I can see how a particular bit-mask could solve the problem this specific instance but fails when one has a highly varied set of workloads running on a system. In this case, makes the most sense to either leave it to the kernel OR have the application itself explicitly ask to be pinned. Also seems like some improvements in kernel scheduling are coming: https://www.phoronix.com/news/Linux-Completing-EEVDF-Sched

So yeah, I think I follow everything, so thank you. Admittedly my education on this topic has been a bit like drinking from a fire hose.

Cheers,
Matt


Matthew T. West
DevOps & HPC SysAdmin
University of Exeter, Research IT
57 Laver Building, North Park Road, Exeter, EX4 4QE, United Kingdom
On 05/04/2024 01:07, Greg Thain via HTCondor-users wrote:
CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.


On 4/4/24 5:42 PM, Matthew T West via HTCondor-users wrote:

Hi Greg,

It seems I do not understand core allocation & usage.

When I am allocated 8 cores for a single job on a dual socket, 64 core (32+32) machine for example, what exactly happens next?


Well .... it depends.  assuming a condor startd with one partitionable slot with all 64 cores in it...

At the very least, the allocation for 8 of these cores means that there are 48 left for condor to allocate to subsequent jobs.  This running job is free to spawn as many processes or threads as it likes.  BUT, if the startd has rootly privilleges (and we're running on Linux), by default, we will put the job in a cgroup, with cgroup cpu.shares proportional to the number of cores allocated to it.  In this case, if the job spawns 64 processes, and there is nothing else running on the system, the kernel will schedule them on all 64 of the cores.  If there is contention, though, for all the cpu processors, the cgroup cpu-shares mechanism means that this job will be limited to using only 8 of the 64  cores on the system, no matter how many runnable processes or threads is spawn.  However, *which* of those 8 cores the job gets to use is up to the kernel.  Unfortunately, I'm led to believe that the kernel may not do as optimal a job as picking which of these cores to run on as we might like.

Optionally, an administrator can set ASSIGN_CPU_AFFINITY in a pslot, and condor will assign a cpu affinity bit-mask to the processes in the job.  In this case, even if the machine is otherwise idle, our job with an 8 core allocation can only use the specific 8 cores that condor assigns to it.  However, HTCondor assigns these in a strictly sequential numerical order, which might not be optimal for any multicore geometry.

Does this make sense?  I might use this an excuse to enhance the online manual a bit.


-greg


  • Does this mean I am just allocated 1/8th of the possible CPU compute cycles on the node, regardless of which core is doing the computation? That job just has a maximum average load allowed of 8. Which core does what task being handled by the OS, presumably then.

~ Matt

P.S. - References that cover these topics are greatly appreciated so I need not pester the list with what I feel like are very basic questions.

Matthew T. West
DevOps & HPC SysAdmin
University of Exeter, Research IT
57 Laver Building, North Park Road, Exeter, EX4 4QE, United Kingdom
On 04/04/2024 17:28, Greg Thain via HTCondor-users wrote:
CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.


On 4/4/24 10:54, Matthew T West via HTCondor-users wrote:
Let me try again, at least regarding CPUs & chiplets.

When requesting multiple CPU cores for a single job, can I specify
that all the cores come from a single NUMA node or single socket, when
the EP is set-up to dynamically allocate slots? Maybe that is the
default but I cannot find any info in the docs.


Hi Matt:

There's no good or easy way to do this today in HTCondor.  The startd
can affinity-lock a job to a cpu-core or a number of cores, but there is
no automatic way in condor when requesting more than one core to
affiinity-lock into some specific geometry.  Part of the problem is one
of naming.  The affinity APIs in the kernel speak in terms of numeric
core-ids, but there is no standard for how the number assigned to a
core-id relates to it's NUMA (or hyperthread) geometry.

Now, there are hack-arounds (there always are!), wherein if you are
willing to forbid jobs from ever running across zones, you can configure
a startd or a p-slot in a startd to be dedicated to a particular subset
of the job, and use ASSIGN_CPU_AFFINITY to lock that startd or p-slot to
that subsection of the cpus on the system.

Personally, my strongly-held, but unfulfilled opinion, is that this is
all the responsibility of the OS kernel, and *it* should figure out
which processes belong together, and schedule them appropriately.  But
perhaps that is naive.

-greg



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/