On 4/4/24 10:54, Matthew T West via HTCondor-users wrote:
Let me try again, at least regarding CPUs & chiplets.When requesting multiple CPU cores for a single job, can I specify that all the cores come from a single NUMA node or single socket, when the EP is set-up to dynamically allocate slots? Maybe that is the default but I cannot find any info in the docs.
Hi Matt:There's no good or easy way to do this today in HTCondor. The startd can affinity-lock a job to a cpu-core or a number of cores, but there is no automatic way in condor when requesting more than one core to affiinity-lock into some specific geometry. Part of the problem is one of naming. The affinity APIs in the kernel speak in terms of numeric core-ids, but there is no standard for how the number assigned to a core-id relates to it's NUMA (or hyperthread) geometry.
Now, there are hack-arounds (there always are!), wherein if you are willing to forbid jobs from ever running across zones, you can configure a startd or a p-slot in a startd to be dedicated to a particular subset of the job, and use ASSIGN_CPU_AFFINITY to lock that startd or p-slot to that subsection of the cpus on the system.
Personally, my strongly-held, but unfulfilled opinion, is that this is all the responsibility of the OS kernel, and *it* should figure out which processes belong together, and schedule them appropriately. But perhaps that is naive.
-greg