Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Question about defragmentation

Date: Tue, 18 Nov 2025 17:19:08 -0600 (CST)
From: Todd L Miller <tlmiller@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Question about defragmentation

You've described the mechanism you'd like to use quite clearly,and correctly observed that HTCondor doesn't support it directly.However, I'm less clear on what plain-English policy you're attempting toimplement, without which it's difficult to suggest good solutions.

That's the most-important part of this message. The rest of itgoes on -- for a while -- about my guess as to what you want, why it thepool isn't already behaving that way, the built-in solution HTCondor comeswith, why it can't be configured to operate how you'd expect, and a hackto maybe make things work anyway.

It sounds like you may want the number of running multi-core jobsto be directly proportional to the number of multi-core jobs in the queue(up to a limit where at most half of the pool's cores are runningmulti-core jobs).* That amounts to a strong bias in favor of multi-corejobs; the more cores per job, the stronger the bias. In terms ofdetermining which submitter's jobs are matched for next, that might meanchanging the default weight assigned to jobs to ignore core count.

Assuming an even (unweighted by core count) mix of jobs in thequeue, your pool "should" trend towards an even (weighted by core count)mix of jobs running. In practice, it probably won't, because when amulti-core job exits, there's a certain chance that the next jobexamined from a submitter who won the "quota/priority dance" won't be amulti-core job, but there's a much smaller chance that enough single-core

jobs will exit at the same time to provide enough cores for a multi-core

job. This is a problem, of course, because the mutlti-core slot, onceit's lost one core to run a single-core job, will then be used to runseven more single-core jobs which won't all exit at the same time. Overtime, then, you would expect a pool that had been divided 50/50 betweensingle- and multi- core slots to become 100% single-core slots.

This is exactly the scenario that the defrag daemon is intended todeal with. I think another way of saying what you want is to say that nomore thean 50% of the EPs in the pool should be willing to waste timemaking sure that their idle cores are only used by multi-core jobs. (Thisis equivalent because HTCondor sorts jobs by submitter before it sortsthem by any other category, when considering which job(s) to match first.)Suppose you're willing to have each EP wait a full minute for a multi-corejob to match: you can then write a START expression that reflects that:


# Something like this.
START = (TARGET.RequestCpus >= 8) || ((time() - EnteredCurrentState) > 60)

Of course, this doesn't help if there aren't any multi-core slotsin the pool because there haven't been any multi-core jobs for a while,and that's where the defrag daemon comes in.



	The defrag daemon will let a machine drain until

`DEFRAG_WHOLE_MACHINE_EXPR` evaluates to true, so if your only concern is8-core jobs, you should set it accordingly.


# Something like this.
DEFRAG_WHOLE_MACHINE_EXPR = Cpus >= 8

You can set `DEFRAG_MAX_WHOLE_MACHINES` so that only half of yourmachines will will drain at any given time:


# If you have 100 machines in your pool.
DEFRAG_MAX_WHOLE_MACHINES = 50

	Allow yourself to drain as many machines as it takes:

# This is deliberately way higher than the actual cap.
DEFRAG_DRAINING_MACHINES_PER_HOUR = 999999

If you want the lowest-possible latency for multicore jobs(without reserving slots), you'll want to force slots to be renegotiatedafter each job. This will cost you quite a bit of extra time negotiatingand reduce your overall throughput, so you may not want to do this rightaway; on the other hand, if you leave the defrag daemon running all thetime, it will might save you quite a bit of lost time.


# Don't ever re-use a slot.
CLAIM_WORKLIFE = 0


	So the question is, when should the defrag daemon be running?

this control seems to be missing.
How do others approach this? Is there some key concept Iʼvemisunderstood or missed?

To my understanding, our local experience is that there are alwaysa mix of jobs in the queue, and so it's appropriate to have a continuousdefragmentation policy. (In other pools, nodes come and go all the time,so no explicit defragmentation is necessary.) It isn't ideal, but youshould be able to turn the defrag daemon on and off with `condor_[on|off]-daemon defrag`, so you could have a little script running on the side(perhaps as a schedd cron job) that looks at the queue and decides whatto do. (One option is to adjust DEFRAG_MAX_WHOLE_MACHINES depending onhow many jobs are in the queue, I suppose.)


-- ToddM

Follow-Ups:
- Re: [HTCondor-users] Question about defragmentation
  - From: Jeff Templon

References:
- [HTCondor-users] Question about defragmentation
  - From: Jeff Templon

Prev by Date: Re: [HTCondor-users] Need help debugging HIBERNATE
Next by Date: Re: [HTCondor-users] Question about defragmentation
Previous by thread: [HTCondor-users] Question about defragmentation
Next by thread: Re: [HTCondor-users] Question about defragmentation
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Question about defragmentation