Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] dynamic slots with gpus
- Date: Fri, 6 Oct 2023 17:07:56 +0000
- From: Justin Killebrew <jk@xxxxxxx>
- Subject: [HTCondor-users] dynamic slots with gpus
Hello.
Iâve enabled dynamic slots on 2 machines:
use feature : GPUs
GPU_DISCOVERY_EXTRA = -extra
# dynamic slot config
cpu = 24
# 24 * 662
memory = 15888
disk = BIG
NUM_SLOTS = 1
NUM_SLOTS_TYPE = 1
SLOT_TYPE_1 = 100%
SLOT_TYPE_1_PARTITIONABLE = TRUE
1 machine has 1 gpu, the other has 2 and condor_status -long bench5 shows correct gpu info.
I submit with:
[â]
request_cpus = 1
request_memory = 800 MB
request_disk = 1 GB
request_gpus = 1
should_transfer_files = yes
when_to_transfer_output = ON_EXIT
transfer_input_files = enable_gpus.py, blender-3.5-splash.blend
queue 10
output of condor_q -better-analyze 111.002:
The Requirements expression for job 111.002 is
(TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) &&
(TARGET.GPUs >= RequestGPUs) && (TARGET.HasFileTransfer)
Job 111.002 defines the following attributes:
RequestDisk = 1048576
RequestGPUs = 1
RequestMemory = 800
The Requirements expression for job 111.002 reduces to these conditions:
Slots
Step Matched Condition
----- -------- ---------
[0] 6 TARGET.Arch == "X86_64"
[1] 6 TARGET.OpSys == "LINUX"
[3] 6 TARGET.Disk >= RequestDisk
[5] 6 TARGET.Memory >= RequestMemory
[7] 2 TARGET.GPUs >= RequestGPUs
111.002: Job is running.
Last successful match: Fri Oct 6 12:19:16 2023
111.002: Run analysis summary ignoring user priority. Of 6 machines,
4 are rejected by your job's requirements
0 reject your job because of their own requirements
1 match and are already running your jobs
0 match but are serving other users
1 are able to run your job
Only 1 machine (with 1 gpu) matches (and runs) all the jobs but I expected the machine with 2 gpus to be split into 2 partitions and run 2 jobs, 1 gpu each.
Is there additional configuration for the 2 gpu machine? Why doesnât it at least run 1 job?
I tried request_gpus >= 1 in the submit file but thatâs a syntax error.
Thanks,
JK