[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Reserve Cpus support in HTCondor



Hi Raman,


Yes — you can do this by adjusting your slot configuration so that HTCondor advertises one fewer CPU core than what the machine physically has. 


Then HTCondor will never allocate that last core to jobs, and you can keep it for housekeeping/interrupts (and things like Weka client threads).


In our setup we use a single partitionable slot, like this:

NUM_SLOTS = 1 NUM_SLOTS_TYPE_1 = 1 SLOT_TYPE_1 = 100% SLOT_TYPE_1_PARTITIONABLE = TRUE SLOT_TYPE_1 = cpus=$(DETECTED_CORES), ram=auto, disk=auto

To reserve one core, just reduce the advertised CPU count by 1, for example:

SLOT_TYPE_1 = cpus=$(DETECTED_CORES)-1, ram=auto, disk=auto


After reloading/restarting the startd, the execute node will offer one less CPU to HTCondor, effectively leaving one core unused by Condor jobs (which you can then pin OS interrupts / Weka client threads to).


One additional point to consider:

Running WNs on storage nodes is something that should be carefully evaluated. 


There is no strong guarantee that user jobs will not destabilize the node — for example, excessive memory usage by user workloads can lead to the server going down, which in turn may cause a cascading failure where storage services are interrupted as well.


Even if you fix the amount of RAM advertised to HTCondor, you should verify that cgroups are correctly and strictly enforcing resource limits. 


In practice, short-lived memory spikes from user jobs can still cause problems, and there are cases where the node becomes unresponsive or crashes before the OOM killer can act, even with cgroups enabled.


Because of this, running compute workloads on storage nodes is usually only a reasonable choice when resources are very limited and occasional server downtime is acceptable from an operational point of view.


Regards,


-- Geonmo



ââââââ ìë ëì ââââââ

ëëìë : Ram Ban <ramban046@xxxxxxxxx>

ëëìë : HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>

ëìëì : 2026-01-17 (í) 21:26:58

ìë : [HTCondor-users] Reserve Cpus support in HTCondor


Hi all,

Is there any support in HTCondor to reserve Cpu0 and not to use it?
As then I can isolate all cpus for performance and update all interrupts and other stuff to Cpu0?
I am also experimenting with weka filesystem and it uses 1-2 cores of each machine it is mounted on, so just want to know is there way to do this?
I am using htcondor vanialla universe

Thanks and Regards 
Raman

PNG image

PNG image