Just had another thought, this would not enforce the strict matching for a job that say did not request any GPUs, i.e RequestGPUs = undefined.
For the requirements solution, it seems to me that it would then evaluate to UNDEFINED which I am unsure what happens with an undefined requirements. Is it ignored or treated as not a match? For the startd config, it is handled if RequestGPUs is undefined, but would still allow a non-gpu job to possibly run on a gpu slot.
The easiest solution to avoid any of this I could think of would be to add a job transform to make any jobs where RequestGPUs is undefined, to RequestGPUs = 0 ------------------------------------- Gianni Pezzarossi Computational System Analyst Research Services Engineering IT Shared Services University of Illinois @ Urbana-Champaign (217)244-7549 engrit-help@xxxxxxxxxxxx From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of Pezzarossi, Gianni Ah! Good point. Forgot about the partitionable slot.
Thanks TJ! ------------------------------------- Gianni Pezzarossi Computational System Analyst Research Services Engineering IT Shared Services University of Illinois @ Urbana-Champaign From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of John M Knoeller You have the right basic idea, but you need a Requirements _expression_ that matches both the partitionable slot and the dynamic slot. The partitionable slot will often have a GPUs and TotalSlotGPUs that is more than RequestGPUs and you
still want to match that, so you need to have your _expression_ apply only to the dynamic slot Like this Requirements = TARGET.DynamicSlot is undefined || TotalSlotGPUs == RequestGPUs Requirements = IfThenElse(PartitionableSlot is undefined, GPUs == RequestGPUs, GPUs >= RequestGPUs) -tj From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of Pezzarossi, Gianni Hey everyone, I had an idea that I was more curious if it world work rather than if it is a good idea. We have some users that complain that sometimes their job that requests 1 GPU will be matched to a slot with 2 GPUs and it happens very intermittently. I suspect this is simply due to jobs ending before CLAIM_WORKLIFE has expired, meaning
the dynamic slot is free to pick up a new job. If the original claim was for 2 GPUs, and a 1GPU job is waiting in the queue, the matchmaker calls it good enough, and allows the job to run (as I assume requesting resources is more of a “the slot must have at
least this much” and not a “must have exactly this much”. I can see why this is done that way as it helps throughput, allow the most jobs to run, rather than try and optimize resource usage (correct me if I’m wrong). For the sake of argument though, I was wondering how you could force it to a kind of “match with exactly the number of GPUs I requested”. Am I wrong in thinking that a dynamic slot has the classad of TotalSlotGPUs, so a requirements statement
in the submission file of something like: Requirements = TotalSlotGPUs == Requestgpus Would only match on slots with exactly the requested number of GPUs in order to avoid GPUs sitting idle? Is there any downside to doing this aside from the impact to throughput of jobs? ------------------------------------- Gianni Pezzarossi Computational System Analyst Research Services Engineering IT Shared Services University of Illinois @ Urbana-Champaign |