[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Prioritizing GPU jobs on partitionable/dynamic slots

Date: Wed, 27 May 2015 12:52:07 -0500
From: Vladimir Brik <vladimir.brik@xxxxxxxxxxxxxxxx>
Subject: [HTCondor-users] Prioritizing GPU jobs on partitionable/dynamic slots

Hello,

I am reconfiguring our cluster to use dynamic GPU slots instead ofstatic ones, and I have trouble figuring out how to ensure that GPU jobsaren't starved because of non-GPU jobs without wasting orover-committing resources.

For example, with slot definition below, 2 CPU jobs, or 1 job thatrequests 2GB will block GPU jobs from landing on this node:

SLOT_TYPE_1 = cpus=2, mem=2GB, gpus=2

Ideally, I'd like non-GPU jobs to be killed one-by-one, starting fromyoungest, until there is space for a GPU job, but only if there are idleGPU jobs in queue that could use this machine (if it weren't for CPUjobs). I have no idea how to implement this though (without externalscripts).

The only way I can think of to prevent GPU job starvation is eithercreating a separate partitionable slot only for GPU jobs, or having asingle partitionable slot, but preventing CPU jobs from using all itsCPU and memory using START and APPEND_REQUIREMENTS expressions.


Is there a better way?



Thanks,

Vlad

Prev by Date: Re: [HTCondor-users] Archives
Next by Date: [HTCondor-users] schedd classads for a few job states
Previous by thread: Re: [HTCondor-users] Archives
Next by thread: [HTCondor-users] schedd classads for a few job states
Index(es):
- Date
- Thread