SLOT_TYPE_2_START = TARGET.BackfillJob
says that in order to match with the backfill slot, a job must have
BackfillJob=true .
In the job classad. Do your non-gpu jobs have that?
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Vikrant Aggarwal <ervikrant06@xxxxxxxxx>
Sent: Friday, February 7, 2025 3:01 PM To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> Subject: [HTCondor-users] Running non-gpu job on gpu machine referring Whats_New_condor_week_2023 Hello Experts,
I was reading the presentation Whats_New_condor_week_2023 and came across an interesting feature of backfill which I wanted to use on a gpu machine.
From the presentation, I made this configuration, my GPU job runs on the machine without any trouble.
START = $(START)
use feature : GPUs GPU_DISCOVERY_EXTRA = -extra PreemptMaxRuntime = 4 * 24 * 60
ExemptMaxRuntime = 4 * 24 * 60 BackfillSlot = true ResourceConflict = "GPUs" use FEATURE : PartitionableSlot(1, 100%) SLOT_TYPE_1_START = TARGET.RequestGpus > 0 SLOT_TYPE_2_BACKFILL = true use FEATURE : PartitionableSlot(2, 90%, GPUs=0) SLOT_TYPE_2_PREEMPT = size(ResourceConflict?:"") > 0 SLOT_TYPE_2_START = TARGET.BackfillJob However a non-gpu machine stays in idle status. --better-analyze doesn't reveal why it's in idle status.
executable = sleep.sh
transfer_executable = false arguments = 600 should_transfer_files = NO +BackfillJob = True queue 1 following I see in better-analyze for second slot.
The Requirements _expression_ for this slot reduces to these conditions:
Clusters Step Matched Condition ----- -------- --------- [0] 1 START [1] 1 WithinResourceLimits Am I missing anything in the configuration to make non-gpu jobs run on a gpu machine?
For clarity: at the time of testing no GPU job was running on that machine, it was a completely idle machine.
Also, is the feature PreferGPUJobs mentioned in ppt introduced yet or not, couldn't find anything in release notes about it.
Thanks & Regards,
Vikrant Aggarwal
|