[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] 7.8 and defrag or similar of dynamic slots




On 5/18/12 12:19 PM, Ian Cottam wrote:
We are thinking about updating to 7.8.0.
I noticed that there is, with 7.8, a defrag daemon for dynamic slots.
On our (main) pool we have preemption off anyway: am I right in thinking
that this defrag then is not for us?
Defragmentation is desirable when jobs requiring large slots (e.g. many 
cores or big memory) suffer from starvation (rarely or never getting 
scheduled to run) due to fragmented machines.  Machines become 
fragmented when they are partitioned into small slots to fit small 
jobs.  If many small jobs are running on a machine at the same time, the 
chance is small that they will all exit at the same time, freeing up a 
large chunk of resources for large jobs to use.  The Condor negotiator's 
resource allocation algorithm currently just works with the slots that 
exist.  It does not make reservations or preempt multiple slots, so some 
method of defragmenting machines is needed to avoid the problem of 
starvation of large jobs.
Defragmentation can cause jobs to be killed.  If you do not want that, 
MaxJobRetirementTime can be used to specify how long jobs should be 
allowed to run on machines that are being drained.
I only ask because sometimes (with 7.4/7.6) and dynamic slots we see
partial matches that don't go through and wondered if there was something
in 7.8 that helps with this.
If by "partial matches that don't go through" you mean the starvation 
problem I mentioned above, then condor_degrag can help.  If it is some 
other problem, then it may or may not.
--Dan

regards
-Ian