Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] dSlots and RANK expression
- Date: Mon, 2 Apr 2012 09:26:45 +0200
- From: Carsten Aulbert <Carsten.Aulbert@xxxxxxxxxx>
- Subject: [Condor-users] dSlots and RANK expression
Hi all
I'm currently stuck with 7.6.6 and dynamic slots.
We have a set of very important jobs which we tried to push into our cluster
with a very good effective prio (factor 1 instead of 100 or 1000 for other
users). However, this job requires all 4 cores on a single machine and is
being starved for CPU cores:
condor_q -b yields:
25879.000: Run analysis summary. Of 7599 machines,
0 are rejected by your job's requirements
7370 reject your job because of their own requirements
2 match but are serving users with a better priority in the pool
0 match but reject the job for unknown reasons
74 match but will not currently preempt their existing job
0 match but are currently offline
153 are available to run your job
but as these 153 are not on the same machine, the jobs will not start to run.
Question (a): Why is this job not running, is there a way to let Condor
"move"/preempt remote std.universe jobs to make room for this job?
Thus, I thought I could help Condor a bit by adding a few nodes with special
START/RANK settings, e.g. instead of
START=TRUE
RANK=0.0
I tried various versions like:
START = (Target.RemoteOwner == "highprio@xxxxxxxxxxx") || (Target.JobUniverse
== 1)
RANK = ( Owner =?= "highprio@xxxxxxxxxxx" )
the start expression works ok - at least from what I've seen - that only std
universe jobs get matched, but these are not pushed from the machine by the
waiting jobs submitted by "highprio".
question (b): Is machine RANK evaluate on a "subslot" basis? I think this
would explain why there will never be a match by the "RequestCpus=4" job from
aboce, when the subslot has fewer cores
question (c) Do I need condor_defrag from 7.7? If yes, is it considered safe
to run 7.6 submit and execute machines along with a few 7.7 execute machines?
question (d): Is my way of thinking totally flawed?
Cheers
carsten