[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to reserve resources for GPU jobs



If you want your machine that has GPUs to match only jobs that request GPUs, set


    START = (TARGET.RequestGPUs ?: 0) > 0

This simplifies to 

    START = (TARGET.RequestGPUs ?: 0)

With the above START _expression_, only jobs that request at least 1 GPU will match.   That's not quite what you asked for,
but it shows the way.  you just need the START _expression_ to evaluate to false for cpu jobs while there is still memory
and cpus available.   

I will show this using a temp variable to hold the CPU jobs _expression_.

   start_cpu_jobs = (Cpus - TARGET.RequestCpus) >= 1 && (Memory - TARGET.RequestMemory) >= (128+1024)
  START = IfThenElse(TARGET.RequestGPUs ?: 0, true, $(start_cpu_jobs) )

This simplifies to

    START = (TARGET.RequestGPUs ?: 0) || $(start_cpu_jobs) 

note that if you already have a START _expression_ that is not just TRUE, this should be

START = $(START) && ( (TARGET.RequestGPUs ?: 0) || $(start_cpu_jobs)  )

-tj


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of K._Scott Rowe <krowe@xxxxxxxx>
Sent: Monday, August 25, 2025 4:30 PM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] How to reserve resources for GPU jobs

Hey there.  Imagine I have an EP running HTCondor-23.0.17 with 24 cores,
512GB RAM, and one GPU.  There are many CPU-only jobs running on this EP
for weeks at a time, and there are usually one or two GPU jobs as well. 
The CPU-only jobs may take weeks to finish, so sadly a GPU job may have
to wait weeks to start.  I would like GPU jobs to not have to wait so long.

Is there a way I could reserve say 1 core and 128GB of RAM for GPU jobs,
and only GPU jobs, on this EP thus letting CPU-only jobs continue to run
on the other 23 cores and 384GB of RAM?

I have been trying to do this with static slots but have not figured out
how to make a slot that has the GPU as a resource and will NOT run
CPU-only jobs.

I should also mention that we don't use preemtion and really don't want
to use it as it doesn't work well with our pipeline.  I would also
rather not ask our users to add a ClassAd to their submit scripts (e.g.
+IsGPUJob), but if that is the only way, then so be it.

Thanks

--

K. Scott Rowe -- Science Information Services
Science Operations Center, National Radio Astronomy Observatory
1011 Lopezville Socorro, NM 87801
krowe@xxxxxxxx -- 1.575.835.7193 -- https://urldefense.com/v3/__http://www.nrao.edu__;!!Mak6IKo!IshyrPRFTwy-zul-FivGEH-AsRP62e2ZafRLF_z6yc9_EYrjmi_JJ2eWbBMvgyT5eEmI2GcxD7UOAn13$

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/