[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] 7.8 and defrag or similar of dynamic slots





On 5/18/12 12:19 PM, Ian Cottam wrote:
We are thinking about updating to 7.8.0.
I noticed that there is, with 7.8, a defrag daemon for dynamic slots.
On our (main) pool we have preemption off anyway: am I right in thinking
that this defrag then is not for us?

Defragmentation is desirable when jobs requiring large slots (e.g. many cores or big memory) suffer from starvation (rarely or never getting scheduled to run) due to fragmented machines. Machines become fragmented when they are partitioned into small slots to fit small jobs. If many small jobs are running on a machine at the same time, the chance is small that they will all exit at the same time, freeing up a large chunk of resources for large jobs to use. The Condor negotiator's resource allocation algorithm currently just works with the slots that exist. It does not make reservations or preempt multiple slots, so some method of defragmenting machines is needed to avoid the problem of starvation of large jobs.

Defragmentation can cause jobs to be killed. If you do not want that, MaxJobRetirementTime can be used to specify how long jobs should be allowed to run on machines that are being drained.


I only ask because sometimes (with 7.4/7.6) and dynamic slots we see
partial matches that don't go through and wondered if there was something
in 7.8 that helps with this.

If by "partial matches that don't go through" you mean the starvation problem I mentioned above, then condor_degrag can help. If it is some other problem, then it may or may not.

--Dan

regards
-Ian