Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Understanding user priority and job preemption
- Date: Wed, 14 Mar 2007 13:32:13 +0100
- From: Jan Ploski <Jan.Ploski@xxxxxxxx>
- Subject: Re: [Condor-users] Understanding user priority and job preemption
condor-users-bounces@xxxxxxxxxxx schrieb am 03/13/2007 07:38:30 PM:
>
> I too am curious why you don't see the expected ratio of jobs running.
>
> Here is one thing that may help in your condor configuration (on the
> nodes running startds).
>
> CLAIM_WORKLIFE = 600
>
> This prevents the schedd's claim to the startd from lasting
> indefinitely. Without this setting, the schedd will hold on to a claim
> as long as it has jobs to run on it (and as long as it doesn't get
> preempted).
Thanks for the reply. Unfortunately, I am afraid that CLAIM_WORKLIFE does
not affect job preemption.
Today I analyzed my problem some more. In particular, I tested a variant
without any nodes that don't match job requirements. That is, I tested
with 20 rather than 60 total nodes.
As before, user A has priority 4 and user B has priority 8.
In the 20-node scenario, I can observe the following behavior:
1. If user A submits jobs first, taking all machines, and user B comes in
later, then user B does not get any machines - A's jobs are never
preempted. User B does not get machines even if I remove some running jobs
of user A. In this case A's jobs are preferred,
no matter what.
2. If user B submits jobs first, taking all machines, and user A comes in
later, then B's jobs are preempted and the expected ratio of machines 1:2
becomes established.
Compare this with the 60-node scenario with 20 matching nodes, described
in my previous message:
1. User A submits first, B comes later. The effect is the same as in case
1 above, B starves.
2. User B submits first, A comes later. Here, the expected 1:2 ratio does
not set in. Instead, ALL 20 B-jobs are preempted and replaced with 20
A-jobs.
Based on these observations, I speculate that the following is true:
- Condor never preempts a running job of user A in favor of a job of user
B when A.userprio < B.userprio, no matter what PREEMPTION_REQUIREMENTS is
set to; this would explain the "insufficient priority" messages I see in
NegotiatorLog
- In the second scenario, Condor calculates A's pie slice as 2/3 * 60 = 40
nodes (rather than 2/3 * 20 = 13 matching nodes) and B's pie slice as 1/3
* 60 = 20 nodes (rather than 1/3 * 20 = 7 nodes). During negotiation
Condor tries to satisfy A's contingent first because A.userprio <
B.userprio. All 20 matching nodes are assigned to A because 20 < 40. Next,
Condor tries to satisfy B's contingent, but does not find any nodes which
match or are preemptible based on the first rule. Therefore, B gets
nothing.
Can anyone confirm that the above reasoning is correct?
If it is correct:
- Why is Condor assigning "pie slices" based on the total number of nodes
in the pool rather than the total number of matching nodes?
- Is there any way to achieve the expected 1:2 ratio between two users
competing for N specific machines of a pool with a total size of M >= 3*N?
Best regards,
Jan Ploski
--
Dipl.-Inform. (FH) Jan Ploski
OFFIS
Betriebliches Informationsmanagement
Escherweg 2 - 26121 Oldenburg - Germany
Fon: +49 441 9722 - 184 Fax: +49 441 9722 - 202
E-Mail: Jan.Ploski@xxxxxxxx - URL: http://www.offis.de