Since you are not using partitionable slots, the GPUs will be permanently assigned to slots when the STARTD starts.
If you have only 2 gpus and want 2 slots, Each slot will be assigned only a single GPU â so jobs that want more than
1 GPU will not match any of your slots.Â
Â
You can instead both GPUs to 1 of your two slots so that slot will match jobs that want 1 or 2 GPUs.
Â
SLOT_TYPE_1 = cpus=1, GPUs=2, mem=auto
NUM_SLOTS_TYPE_1 = 1
SLOT_TYPE_2 = cpus=1, mem=auto
NUM_SLOTS_TYPE_2 = 1
Â
Or you can switch to using partitionable slots, and let HTCondor decide how to divide up resources based on
What the jobs request. Be aware that if you do this, the 1 GPU jobs will tend to dominate (if you have an infinite
supply of them), since once a 1 GPU job starts the remainder of the partitionable slot will only match 1 GPU jobs.
Â
-tj
Â
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Francisco Pereira
Sent: Monday, January 18, 2016 1:39 PM
To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] assigning multiple GPUs to a single slotÂ
Hi,
Â
I have been scheduling GPU jobs in our cluster byÂ
Â
1) setting in the config file for each node
Â
"use feature: GPUs"
"GPU_DISCOVERY_EXTRA = -extra"
Â
(as suggested in the documentation, running condor_gpu_discovery -properties manually produces the right results for each machine)
Â
2) setting up a number of slots with 1 CPU each, e.g. in a 2-GPU machine.
Â
"SLOT_TYPE_1 = cpus=1,mem=auto
SLOT_TYPE_1_PARTITIONABLE = FALSE
NUM_SLOTS_TYPE_1 = 2"
Â
When submitting jobs that have "request_GPUs=1" in the submit file the jobs get scheduled to machines that have a GPU, and there are no more jobs being scheduled than there are GPUs, across multiple machines. However, when I specify "request_GPUs=2", the job stays in the queue with status "I", even though the requested number is available.
Â
Hence, I am wondering what I am doing wrong and whether I have incorrectly set up the basic mechanism in #2. The GPU discovery works beautifully, so I suspect I am overcomplicating ...Â
Â
thank you for your help!
Francisco
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/