Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] gpu's and preemption

Date: Thu, 28 Feb 2019 14:26:55 -0600 (CST)
From: Todd L Miller <tlmiller@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] gpu's and preemption

	Disclaimer: I am not a preemption expert.

ideally what i'd like to happen is that 4 of usera's jobs are
preempted for userb's.  just the fact that a user is asking for a gpu
should be enough to preempt another person from a slot that isn't

This sounds like a job for a (machine) RANK expression.(Something like RANK = RequestGPUs, probably.) That will work just finewith pslots, except ...

as extra credit, what happens when the box has 16 cores and 4 gpus,
and userb comes along and asks for two cpus/one gpu per job, does it
kick eight of usera's jobs off?

... it won't kick eight of usera's jobs off. If you want to do that,you'll have to set ALLOW_PSLOT_PREEMPTION = TRUE in the configuration /ofthe negotiator/. By itself, this doesn't allow priority-based preemption,but it will change the behavior of preemption with respect to pslots foryour entire pool, so be you'll want to be sure that no other preemption istaking place (or that you understand the consequences).

Be aware, however, that the way HTCondor combines slots when doingpslot preemption does /not/ wait for the corresponding jobs to finishexiting before reassigning their resources, so HTCondor may overcommit insome cases (e.g., the non-GPU jobs take so long to vacate that the GPU jobfinishes transferring and starts before they finish).

If you don't set ALLOW_PSLOT_PREEMPTION, undersized dynamic slotswill be ignored. If you'd rather not preempt at all, you can attemptaddress the issue via draining, instead.

If the issue happens rarely enough for manual intervention to bereasonable, starting with 8.9.0, you'll be able to address the issue withthe condor_now command (whose version of slot coalescing doesn't sufferfrom the overcommit issue described above). I suppose you could try toscript the tool, but that seems fraught with peril.


- Toddm

Follow-Ups:
- Re: [HTCondor-users] gpu's and preemption
  - From: Michael Di Domenico

References:
- [HTCondor-users] gpu's and preemption
  - From: Michael Di Domenico

Prev by Date: Re: [HTCondor-users] transfer_in/output_files only if they exist
Next by Date: [HTCondor-users] HTCondor 8.9.0 Released
Previous by thread: [HTCondor-users] gpu's and preemption
Next by thread: Re: [HTCondor-users] gpu's and preemption
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] gpu's and preemption