Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Prioritizing GPU jobs on partitionable/dynamic slots
- Date: Wed, 27 May 2015 12:52:07 -0500
- From: Vladimir Brik <vladimir.brik@xxxxxxxxxxxxxxxx>
- Subject: [HTCondor-users] Prioritizing GPU jobs on partitionable/dynamic slots
Hello,
I am reconfiguring our cluster to use dynamic GPU slots instead of
static ones, and I have trouble figuring out how to ensure that GPU jobs
aren't starved because of non-GPU jobs without wasting or
over-committing resources.
For example, with slot definition below, 2 CPU jobs, or 1 job that
requests 2GB will block GPU jobs from landing on this node:
SLOT_TYPE_1 = cpus=2, mem=2GB, gpus=2
Ideally, I'd like non-GPU jobs to be killed one-by-one, starting from
youngest, until there is space for a GPU job, but only if there are idle
GPU jobs in queue that could use this machine (if it weren't for CPU
jobs). I have no idea how to implement this though (without external
scripts).
The only way I can think of to prevent GPU job starvation is either
creating a separate partitionable slot only for GPU jobs, or having a
single partitionable slot, but preventing CPU jobs from using all its
CPU and memory using START and APPEND_REQUIREMENTS expressions.
Is there a better way?
Thanks,
Vlad