Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Parallel universe locality and dynamic slots, was Re: Different NEGOTIATOR_PRE_JOB_RANK ...
- Date: Tue, 10 Mar 2015 12:26:36 +0100
- From: Steffen Grunewald <Steffen.Grunewald@xxxxxxxxxx>
- Subject: [HTCondor-users] Parallel universe locality and dynamic slots, was Re: Different NEGOTIATOR_PRE_JOB_RANK ...
On Wed, Mar 04, 2015 at 10:15:22AM +0100, Steffen Grunewald wrote:
> On Tue, Mar 03, 2015 at 02:51:59PM +0000, Peter F. Couvares wrote:
> > The NEGOTIATOR_PRE_JOB_RANK boolean expression evaluates in the context of each machine classad (including anything you publish in the machine ad from job ads of the jobs already running there, via STARTD_JOB_ATTRS), so you can simply reference the universe and give it a different rank. Something like (in pseudocode):
> >
> > NEGOTIATOR_PRE_JOB_RANK = (is_parallel * 10) + (is_not_parallel * 20) + other_stuff
>
> To avoid large amounts of parentheses, would it work to have
>
> NEGOTIATOR_PRE_JOB_RANK = ifThenElse( (Target.JobUniverse =?= 11), \
> $(PARALLEL_PRE_JOB_RANK), \
> $(NONPARALLEL_PRE_JOB_RANK) )
>
> with the two helper expressions accordingly set? (NONPARALLEL* basically mimicking
> the default one, "a - b*Memory - c*Cpus", PARALLEL* something like "x + y*Cpus - z*Memory")
I tried this.
It works.
But it's far from optimal: up-ranking nodes with many free Cpus also downranks
the same node for the next slot - apparently not all available Cpu resources are
used up in one go.
The MatchLog entries look like this:
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.95.2:46494> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.87.9:58903> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.96.14:50299> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.95.35:37206> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.98.29:46401> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.93.23:42543> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.94.13:34502> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.91.18:55442> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.95.15:43258> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.86.24:34800> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.98.13:34906> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.87.36:52863> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.96.36:40336> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.99.15:45304> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.96.27:38671> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.90.2:37276> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.90.12:33741> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.91.31:34350> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.93.11:46584> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:45 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.93.12:48341> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:46 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.86.20:39903> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:46 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.93.31:56767> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:46 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.88.10:51698> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:46 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.85.31:50928> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:46 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.95.12:35101> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:46 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.96.20:37955> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:46 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.95.31:55384> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:46 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.88.20:41212> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:46 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.93.26:56174> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:46 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.93.35:58997> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:46 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.97.14:53998> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:46 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.96.26:39078> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:46 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.97.25:52622> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:46 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.88.38:50720> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:15:46 Rejected 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088>: no match found
MatchLog:15-03-09_21:16:46 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.95.33:56988> slot1@xxxxxxxxxxxxxxxxxx
MatchLog:15-03-09_21:16:46 Matched 1340600.0 DedicatedScheduler@xxxxxxxxxxxxxxxxxx <10.150.100.40:52088> preempting none <10.150.99.27:36363> slot1@xxxxxxxxxxxxxxxxxx
- 40 individual nodes. Locality gone.
Now this happened with a pretty full pool, thus I cannot guarantee there
were no multi-core nodes available at all, but my tests with a "local
parallel universe" on an isolated machine were not promising, as I got
a single slot every 20 seconds from an unfragmented 64-core partitionable
slot.
Apparently parallel universe is in urgent need of love from some developer.
(I'm trying to imagine a strategy how to actually perform a repeated match
against the same slot as long as there are resources available, but I cannot
see any Condor "on-board" means to do that.)
With static partitioning, the situation is completely different - still needs
proper crafting of PRE_JOB_RANKs, but in principle there's nothing stopping the
matching process from selecting all resources from one machine in one go.
But static partitioning is wasting resources...
Thanks for listening
Steffen