[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to reserve resources for GPU jobs {External}



Thanks. Your first suggestion that blocks all non-gpu jobs works. Your second suggestion to allow some non-gpu jobs doesn't work.


Is the Memory variable in your example the amount of available memory on the node? Because it seems to act more like the amount of memory requested by the job. For example, if I add these two lines, simplified from your suggestion, to my config


 start_cpu_jobs = (Memory >= 1023)
 START = $(START) && $(start_cpu_jobs)


and submit non-gpu a job asking for 1GB (request_memory = 1 G) of memory, the job runs. But if I set


 start_cpu_jobs = (Memory >= 1025)
 START = $(START) && $(start_cpu_jobs)


and submit the same non-gpu job, it stays idle, even though "condor_q -better" tells me there is 1 machine

able to run my job.


Thanks



I get just one return, when there are no jobs running

On 8/25/25 16:37, John M Knoeller via HTCondor-users wrote:
If you want your machine that has GPUs to match only jobs that request GPUs, set


  START = (TARGET.RequestGPUs ?: 0) > 0

This simplifies to

  START = (TARGET.RequestGPUs ?: 0)

With the above START expression, only jobs that request at least 1 GPU will match. ÂThat's not quite what you asked for, but it shows the way. you just need the START expression to evaluate to false for cpu jobs while there is still memory
and cpus available.

I will show this using a temp variable to hold the CPU jobs expression.

 Âstart_cpu_jobs = (Cpus - TARGET.RequestCpus) >= 1 && (Memory - TARGET.RequestMemory) >= (128+1024)
 START = IfThenElse(TARGET.RequestGPUs ?: 0, true, $(start_cpu_jobs) )

This simplifies to

  START = (TARGET.RequestGPUs ?: 0) || $(start_cpu_jobs)

note that if you already have a START expression that is not just TRUE, this should be

START = $(START) && ( (TARGET.RequestGPUs ?: 0) || $(start_cpu_jobs)Â )

-tj

------------------------------------------------------------------------
*From:*ÂHTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of K._Scott Rowe <krowe@xxxxxxxx>
*Sent:*ÂMonday, August 25, 2025 4:30 PM
*To:*Âhtcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
*Subject:*Â[HTCondor-users] How to reserve resources for GPU jobs

Hey there. Imagine I have an EP running HTCondor-23.0.17 with 24 cores,
512GB RAM, and one GPU. There are many CPU-only jobs running on this EP
for weeks at a time, and there are usually one or two GPU jobs as well.
The CPU-only jobs may take weeks to finish, so sadly a GPU job may have
to wait weeks to start. I would like GPU jobs to not have to wait so long.

Is there a way I could reserve say 1 core and 128GB of RAM for GPU jobs,
and only GPU jobs, on this EP thus letting CPU-only jobs continue to run
on the other 23 cores and 384GB of RAM?

I have been trying to do this with static slots but have not figured out
how to make a slot that has the GPU as a resource and will NOT run
CPU-only jobs.

I should also mention that we don't use preemtion and really don't want
to use it as it doesn't work well with our pipeline. I would also
rather not ask our users to add a ClassAd to their submit scripts (e.g.
+IsGPUJob), but if that is the only way, then so be it.

Thanks

--

K. Scott Rowe -- Science Information Services
Science Operations Center, National Radio Astronomy Observatory
1011 Lopezville Socorro, NM 87801
krowe@xxxxxxxx -- 1.575.835.7193 -- https://urldefense.com/v3/__http://www.nrao.edu__;!!Mak6IKo!IshyrPRFTwy-zul-FivGEH-AsRP62e2ZafRLF_z6yc9_EYrjmi_JJ2eWbBMvgyT5eEmI2GcxD7UOAn13$ <https://urldefense.com/v3/__http://www.nrao.edu__;!!Mak6IKo!IshyrPRFTwy-zul-FivGEH-AsRP62e2ZafRLF_z6yc9_EYrjmi_JJ2eWbBMvgyT5eEmI2GcxD7UOAn13$>

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/ <https://www-auth.cs.wisc.edu/lists/htcondor-users/>

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/