[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CPU affinity per dynamic slot



Was actually wondering about this for GPUs. In a machines with GPUs on each NUMA node, I imagine there is more of a case for having separate slots.  How would condor assign GPUs with a 50% split on 2 slots? Just in order that condor_gpu_discovery reports?

> On 8 Aug 2024, at 21:20, Jaime Frey via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
> 
> Weâve generally found that the kernel is much better at efficiently scheduling job threads onto cores than HTCondor could. So, the default configuration is to not set cpu affinity and rely on cgroups to ensure that a job doesnât exceed its share of cores when there is contention. 
> 
> Your situation may qualify for an âunusual setupâ mentioned in the manual. If many of your jobs are high-core-count such that the kernel may be have to allocate some jobs a set of cores split across the cpus, then an explicit divide may be warranted.
> 
> Given the resource split you providde, I believe this configuration will give you the affinity assignments you want:
> 
> # Note: must list all cpu ids explicitly
> ENFORCE_CPU_AFFINITY = True
> SLOT1_CPU_AFFINITY = 0,1,2,â,192
> SLOT2_CPU_AFFINITY = 193,194,â,385
> 
> Note that each job will have an affinity for all cores assigned to its p-slot. Thus, youâre still relying on cgroups to ensure that each job doesnât use more cores over time than it requests.
> 
>  - Jaime
> 
>> On Aug 6, 2024, at 10:03âAM, Thomas Birkett - STFC UKRI via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
>> 
>> Dear Condor community,
>>   We have recently acquired some workernodes that will act as startdâs with many threads per host. Due to this large increase in threads per host, weâd like to create 2 dynamic partitions and, if possible, map the threads on CPU0 to partition 1 and the threads on CPU1 to partition 2. This will also have the benefit that x16 GMI links between the CPUâs arenât bottlenecked. I currently have the following startd config:
>>   NUM_SLOTS = 2
>> NUM_SLOTS_TYPE_1 = 1
>> NUM_SLOTS_TYPE_2 = 1
>> SLOT1_EXECUTE = /pool_1
>> SLOT2_EXECUTE = /pool_2
>> SLOT_TYPE_1 = cpus=193,mem=50%,auto
>> SLOT_TYPE_1_PARTITIONABLE = TRUE
>> SLOT_TYPE_2 = cpus=193,mem=50%,auto
>> SLOT_TYPE_2_PARTITIONABLE = TRUE    With this config, I believe this will split the host resources between the two partitionable slots, however it would be preferable if there was some CPU affinity for SLOT1 to use threads on CPU0 and similar for SLOT2. Looking through the documentation I can see the ClassAd of `SLOT<N>_CPU_AFFINITY` which looks to be exactly what I wish to do, however I notice the line âThis configuration variable is replaced by ASSIGN_CPU_AFFINITY. Do not enable this configuration variable unless using glidein or another unusual setup.â  Which makes me think this is not an optimal ClassAd to use for this use case and âASSIGN_CPU_AFFINITYâ appears to be a Boolean with no option to define thread mappings to partitions.
>>   I imagine this is possible using some external script to do the mapping using a command such as ` lscpu -p=NODE,CPU` however Iâm struggling to put all the pieces together. If anyone has any pointers or advise it would be gratefully received.
>>   Many thanks in advance,
>>   Thomas Birkett
>> Senior Systems Administrator
>> Scientific Computing Department  
>> Science and Technology Facilities Council (STFC)
>> Rutherford Appleton Laboratory, Chilton, Didcot 
>> OX11 0QX
>>   <image001.png>
>>     _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/