[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] RequestCpus > 1 and Dynamic (Partitionable) Slots
- Date: Sat, 15 Jan 2011 20:52:10 -0500
- From: Erik Aronesty <erik@xxxxxxx>
- Subject: [Condor-users] RequestCpus > 1 and Dynamic (Partitionable) Slots
I've confirmed on many tests that jobs with RequestCpus > 1 don't seem to be compatible with dynamic slots.
Is this a condor version issue? I'm running 7.4.4 on x86_64
Our system has many, many jobs that consume between 1-8 cpus and many SMP machines with 4 and 32 cores.
(I can use condor_qedit and get a job to run on a dynamic slot just by switching its Cpus to 1. It will not run otherwise ... even if Start=TRUE)
The message from analyse is "2 reject your job because of their own requirements" ... (or however many slots are partitionable).
It would be nice to be able to take a job id and a node, and then ask for an explanation of why it's not running on that node.
If I run a bunch of jobs with 1 cpu... the dynamic slot works as advertised... forking off new slots and reclaiming them later... quite nicely. I even like to leave some lots this way - since they are so much better about resource utilization... in every other respect.
I've noticed one other thread posting about this, but have never seen a final solution.
https://www-auth.cs.wisc.edu/lists/condor-users/2009-June/msg00065.shtml
Has anyone gotten dynamic slots to work with RequestCpus > 1... where it actually decrements the number of cpus from those remaining?
> condor -version
$CondorVersion: 7.2.4 Apr 11 2010 $
$CondorPlatform: X86_64-LINUX_DEBIAN_UNKNOWN $
>condor_status ea-morpheus -l | grep Cpu
CpuIsBusy = false
Cpus = 1
CpuBusyTime = 0
CpuBusy = ( ( LoadAvg - CondorLoadAvg ) >= 0.500000 )
TotalCpus = 4
Machine doing nothing:
>cat /srv/condor/ea-morpheus
DAEMON_LIST = MASTER, STARTD
NUM_SLOTS=1
SLOT_TYPE_1=Cpu=4,auto
SLOT_TYPE_1_PARTITIONABLE=TRUE
NUM_SLOTS_TYPE_1=1
START=TRUE
JOB not running:
> condor_q 6490.0 -l | grep Req
AutoClusterAttrs = "JobUniverse,LastCheckpointPlatform,NumCkpts,RequestCpus,RequestDisk,RequestMemory,FileSystemDomain,DiskUsage,ImageSize,Requirements,NiceUser,ConcurrencyLimits"
RequestDisk = DiskUsage
RequestMemory = 500
RequestCpus = 2
Requirements = ( Memory >= 500 ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= DiskUsage ) && ( ( RequestMemory * 1024 ) >= ImageSize ) && ( TARGET.FileSystemDomain == MY.FileSystemDomain )
- Erik