Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] GPU machines is "mixed" environment (conditional preemption)
- Date: Thu, 7 Oct 2010 16:19:40 +0200
- From: Carsten Aulbert <carsten.aulbert@xxxxxxxxxx>
- Subject: [Condor-users] GPU machines is "mixed" environment (conditional preemption)
Hi all,
we have a bunch of machines with GPU cards inside. I'm already advertising
these via the ClassAds, e.g.
STARTD_ATTRS = GPU_DEV, GPU_NAME, GPU_CAPABILITY, GPU_GLOBALMEM_MB,
GPU_MULTIPROC, GPU_NUMCORES, GPU_CLOCK_GHZ
SLOT1_GPU_DEV=0
SLOT1_GPU_NAME="Tesla C2050"
SLOT1_GPU_CAPABILITY=2.0
SLOT1_GPU_GLOBALMEM_MB=2687
SLOT1_GPU_MULTIPROC=14
SLOT1_GPU_NUMCORES=448
SLOT1_GPU_CLOCK_GHZ=1.15
SLOT2_GPU_DEV=1
SLOT2_GPU_NAME="Tesla C2050"
SLOT2_GPU_CAPABILITY=2.0
SLOT2_GPU_GLOBALMEM_MB=2687
SLOT2_GPU_MULTIPROC=14
SLOT2_GPU_NUMCORES=448
SLOT2_GPU_CLOCK_GHZ=1.15
Disabling vanilla universe jobs on this machine as I do want to use preemption
(and of course only let myself run jobs here for testing as the rest of the
pool is a production system).
START = ( Owner =?= "carsten" ) && ( JobUniverse != 5 )
Possible change for the future to allow any universe in conjunction with
NeedGpu otherwise standard universe only (correct this way?)
START = ( JobUniverse != 1 ) || ( TARGET.NeedGpu =!= UNDEFINED )
However, now the big question how to address preemption. Essentially, I want
to ensure that the machine works as a standard compute nodes with multiple
cores (identically setup as others; sans vanilla universe jobs) in the absence
of any idle jobs which have "NeedGpu" set.
As soon as there are idle jobs which have this set and there are jobs running
which have this not set, I'd like to preempt/checkpoint these and let the
other jobs run - however, I'm not quite sure how to achieve this as I would
need to access the currently running JobAd (MYRUNNINGJOB refers to this):
PREEMPTION_REQUIREMENTS = ( MYRUNNINGJOB.NeedGpu =?= UNDEFINED &&
TARGET.NeedGpu =!= UNDEFINED) || ( $(StateTimer) > (4 * $(HOUR)) &&
RemoteUserPrio > SubmittorPrio * 1.2 )
Is there any way to achieve this? Which part of the manual need I to look at
again.
Thanks a lot in advance
Carsten