[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Fractional CPU resources possible?

Date: Mon, 22 Jun 2026 09:09:01 +0200
From: Carsten Aulbert <carsten.aulbert@xxxxxxxxxx>
Subject: Re: [HTCondor-users] Fractional CPU resources possible?

Hi Cole, Thomas, all,

just to close this thread from my side with what we tried and what seemsto work:


On 6/15/26 16:29, Cole Bollig wrote:

HTCondor does not currently support fractional CPUs. One potentialsolution to this is you could lie about the number of CPUs available tothe EP so that the CPU cores are actually over committed. I haveattached some sample configuration I put together to assist anotheradministrator using this concept.

After not really successful trials with "virtual CPU cores", i.e. tryingto lie to condor like


NUM_CPUS = $(DETECTED_CPUS_LIMIT) * 10

and using job transforms on the submit hosts like

JOB_TRANSFORM_CpuFiddle @=end
cpu_weight_factor = 9
IF defined MY.LittleCpu
  cpu_weight_factor = 1
ENDIF
EVALSET RequestCpus RequestCpus * $(cpu_weight_factor)
@end

(while obviously falling prey to one of the two hardest CS problems ;-))

We may have been able to tweak this approach enough to make it workable,but getting to the right multipliers and weight factors which would haveto match the layout of the EP, we opted for Cole's suggested way andsimply created 4 slots for each node[1]:


06/18/26 08:00:08 slot1: New pSlot of type 1 allocated

06/18/26 08:00:08 slot1: Cpus: 8.000000, Memory: 51577, Swap:0.00%, Disk: 25.00%, GPUs: 8

06/18/26 08:00:08 slot2: New pSlot of type 2 allocated

06/18/26 08:00:08 slot2: Cpus: 16.000000, Memory: 180519, Swap:0.00%, Disk: 25.00%, GPUs: 0

06/18/26 08:00:08 slot3: New pSlot of type 3 allocated

06/18/26 08:00:08 slot3: Cpus: 8.000000, Memory: 51577, Swap:0.00%, Disk: 25.00%, GPUs: 8

06/18/26 08:00:08 slot4: New pSlot of type 4 allocated

06/18/26 08:00:08 slot4: Cpus: 16.000000, Memory: 180519, Swap:0.00%, Disk: 25.00%, GPUs: 0


This along with something like

SLOT_TYPE_1_START = (TARGET.RequestGpus isnt Undefined) &&(TARGET.RequestGpus > 0)

for slots 1 and 3 seems to work nicely. The only caveat is that some GPUjobs have vastly different memory needs but I don't see how to shiftthose dynamically between "GPU" and "CPU" slots.

Anyway, yet another time condor has proved to have more than enoughknobs for the job ;-)


Thanks!

Carsten

[1] As we expect quite a bit of GPU to CPU bandwidth needs, we logicallydivide each server into two half to minimize traffic between the CPUs,i.e. CPU0 will only talk to GPUs local to it; well that plus NUMA ;-)


--
Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
CallinstraÃe 38, 30167 Hannover, Germany, Phone +49 511 762 17185

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Follow-Ups:
- Re: [HTCondor-users] Fractional CPU resources possible?
  - From: Couvares, Peter

References:
- [HTCondor-users] Fractional CPU resources possible?
  - From: Carsten Aulbert
- Re: [HTCondor-users] Fractional CPU resources possible?
  - From: Thomas Hartmann
- Re: [HTCondor-users] Fractional CPU resources possible?
  - From: Carsten Aulbert
- Re: [HTCondor-users] Fractional CPU resources possible?
  - From: Thomas Hartmann
- Re: [HTCondor-users] Fractional CPU resources possible?
  - From: Cole Bollig

Prev by Date: Re: [HTCondor-users] Trouble with authentication on Windows after upgrade from 8.9 to 25.11
Next by Date: Re: [HTCondor-users] Trouble with authentication on Windows after upgrade from 8.9 to 25.11
Previous by thread: Re: [HTCondor-users] Fractional CPU resources possible?
Next by thread: Re: [HTCondor-users] Fractional CPU resources possible?
Index(es):
- Date
- Thread