Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Drained 0 machines
- Date: Fri, 17 Jul 2020 21:56:43 +0200
- From: Stefano Dal Pra <stefano.dalpra@xxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Drained 0 machines
Il 17/07/20 21:27, Beyer, Christoph ha scritto:
Hi,
look for DEFRAG_REQUIREMENTS & DEFRAG_WHOLE_MACHINE_EXPR
I did.
The DEFRAG_REQUIREMENTS expression match ~ 400 nodes
DEFRAG_WHOLE_MACHINE_EXPR matches 4 nodes (2 of which not eligible for draining)
I increased DefragLog verbosity and now i see a reason:
[...]
07/17/20 21:22:14 Skipping slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: because it is already running as a whole machine.
07/17/20 21:22:14 Skipping slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: because it is already running as a whole machine.
[...]
And i think this is because i had
DEFRAG_WHOLE_MACHINE_EXPR = ((Cpus == TotalCpus) || (Cpus >= 8)) && (StartJobs =?= True)
and all the skipped machines already have a 8-core job running.
I changed DEFRAG_WHOLE_MACHINE_EXPR to
((Cpus == TotalCpus) || (Cpus >= 16)) && (StartJobs =?= True)
and now i see more machines are put on draining.
Thanks,
Stefano
These knobs define the requirements which machines can be drained and what is considered a drained machine
for ex:
# machine should be partiionable and online
DEFRAG_REQUIREMENTS = PartitionableSlot && Offline=!=True
# drain down to a blob of 12 online cores
DEFRAG_WHOLE_MACHINE_EXPR = Cpus == 12 && Offline=!=True
Best
christoph