Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Understanding user priority and job preemption
- Date: Tue, 13 Mar 2007 17:53:01 +0100
- From: Jan Ploski <Jan.Ploski@xxxxxxxx>
- Subject: [Condor-users] Understanding user priority and job preemption
Hello,
I have a question concerning user priorities, fair scheduling and job
preemption. I am trying to reproduce the behavior described in the
documentation, which I understand as follows: with all other things equal,
a user with priority 4 should be assigned twice as many machines as a user
with priority 8.
To test whether this works, I did the following:
1. Set PREEMPTION_REQUIREMENTS from the default UWCS value to True.
2. Set user priority of user A to 4 and of user B to 8 (with
condor_userprio).
3. Made sure MaxJobRetirementTime is 0.
4. Submitted lots of jobs using user account A.
5. Submitted lots of jobs using user account B.
In both cases my job requirements were stated so that the same 20 nodes
(out of 60 total available nodes) were matching. After a few negotiation
cycles, I expected to see a mix of running jobs consisting of 2/3 A-jobs
and 1/3 B-jobs. However, this did not occur. Instead, only A-jobs were
running and all B-jobs were waiting in the queue.
I set NEGOTIATOR_DEBUG to ALL and looked into NegotiatorLog. Here is what
I saw (with user names changed to match the above description):
3/13 17:32:04 (fd:7) (pid:15842) Phase 4.1: Negotiating with schedds ...
3/13 17:32:04 (fd:7) (pid:15842) NumStartdAds = 60
3/13 17:32:04 (fd:7) (pid:15842) NormalFactor = 2.937731
3/13 17:32:04 (fd:7) (pid:15842) MaxPrioValue = 7.959589
3/13 17:32:04 (fd:7) (pid:15842) NumScheddAds = 2
3/13 17:32:04 (fd:7) (pid:15842) Negotiating with userA@cluster at
<10.0.0.254:16701>
3/13 17:32:04 (fd:7) (pid:15842) 0 seconds so far
3/13 17:32:04 (fd:7) (pid:15842) NEGOTIATOR_IGNORE_USER_PRIORITIES is
undefined, using default value of False
3/13 17:32:04 (fd:7) (pid:15842) Calculating schedd limit with the
following parameters
3/13 17:32:04 (fd:7) (pid:15842) ScheddPrio = 4.107686
3/13 17:32:04 (fd:7) (pid:15842) ScheddPrioFactor = 1.000000
3/13 17:32:04 (fd:7) (pid:15842) scheddShare = 0.659601
3/13 17:32:04 (fd:7) (pid:15842) scheddAbsShare = 0.500000
3/13 17:32:04 (fd:7) (pid:15842) ScheddUsage = 20
3/13 17:32:04 (fd:7) (pid:15842) scheddLimit = 20
3/13 17:32:04 (fd:7) (pid:15842) MaxscheddLimit = 20
3/13 17:32:04 (fd:7) (pid:15842) Socket to <10.0.0.254:16701> already in
cache, reusing
3/13 17:32:04 (fd:7) (pid:15842) Sending SEND_JOB_INFO/eom
3/13 17:32:04 (fd:7) (pid:15842) Getting reply from schedd ...
3/13 17:32:04 (fd:7) (pid:15842) condor_read(): nfds=7
3/13 17:32:04 (fd:7) (pid:15842) condor_read(): nfound=1
3/13 17:32:04 (fd:7) (pid:15842) condor_read(): nfds=7
3/13 17:32:04 (fd:7) (pid:15842) condor_read(): nfound=1
3/13 17:32:04 (fd:7) (pid:15842) Got JOB_INFO command; getting
classad/eom
3/13 17:32:04 (fd:7) (pid:15842) Request 07650.00000:
3/13 17:32:04 (fd:7) (pid:15842) Rejected 7650.0 userA@cluster
<10.0.0.254:16701>: no match found
3/13 17:32:04 (fd:7) (pid:15842) Sending SEND_JOB_INFO/eom
3/13 17:32:04 (fd:7) (pid:15842) Getting reply from schedd ...
3/13 17:32:04 (fd:7) (pid:15842) condor_read(): nfds=7
3/13 17:32:04 (fd:7) (pid:15842) condor_read(): nfound=1
3/13 17:32:04 (fd:7) (pid:15842) condor_read(): nfds=7
3/13 17:32:04 (fd:7) (pid:15842) condor_read(): nfound=1
3/13 17:32:04 (fd:7) (pid:15842) Got NO_MORE_JOBS; done negotiating
3/13 17:32:04 (fd:7) (pid:15842) Schedd userA@cluster got all it wants;
removing it.
3/13 17:32:04 (fd:7) (pid:15842) Negotiating with userB@cluster at
<10.0.0.254:16701>
3/13 17:32:04 (fd:7) (pid:15842) 0 seconds so far
3/13 17:32:04 (fd:7) (pid:15842) NEGOTIATOR_IGNORE_USER_PRIORITIES is
undefined, using default value of False
3/13 17:32:04 (fd:7) (pid:15842) Calculating schedd limit with the
following parameters
3/13 17:32:04 (fd:7) (pid:15842) ScheddPrio = 7.959589
3/13 17:32:04 (fd:7) (pid:15842) ScheddPrioFactor = 1.000000
3/13 17:32:04 (fd:7) (pid:15842) scheddShare = 0.340399
3/13 17:32:04 (fd:7) (pid:15842) scheddAbsShare = 0.500000
3/13 17:32:04 (fd:7) (pid:15842) ScheddUsage = 0
3/13 17:32:04 (fd:7) (pid:15842) scheddLimit = 20
3/13 17:32:04 (fd:7) (pid:15842) MaxscheddLimit = 20
3/13 17:32:04 (fd:7) (pid:15842) Socket to <10.0.0.254:16701> already in
cache, reusing
3/13 17:32:04 (fd:7) (pid:15842) Sending SEND_JOB_INFO/eom
3/13 17:32:04 (fd:7) (pid:15842) Getting reply from schedd ...
3/13 17:32:04 (fd:7) (pid:15842) condor_read(): nfds=7
3/13 17:32:04 (fd:7) (pid:15842) condor_read(): nfound=1
3/13 17:32:04 (fd:7) (pid:15842) condor_read(): nfds=7
3/13 17:32:04 (fd:7) (pid:15842) condor_read(): nfound=1
3/13 17:32:04 (fd:7) (pid:15842) Got JOB_INFO command; getting
classad/eom
3/13 17:32:04 (fd:7) (pid:15842) Request 07524.00000:
3/13 17:32:04 (fd:7) (pid:15842) Rejected 7524.0 userB@cluster
<10.0.0.254:16701>: insufficient priority
3/13 17:32:04 (fd:7) (pid:15842) Sending SEND_JOB_INFO/eom
3/13 17:32:04 (fd:7) (pid:15842) Getting reply from schedd ...
3/13 17:32:04 (fd:7) (pid:15842) condor_read(): nfds=7
3/13 17:32:04 (fd:7) (pid:15842) condor_read(): nfound=1
3/13 17:32:04 (fd:7) (pid:15842) condor_read(): nfds=7
3/13 17:32:04 (fd:7) (pid:15842) condor_read(): nfound=1
3/13 17:32:04 (fd:7) (pid:15842) Got NO_MORE_JOBS; done negotiating
3/13 17:32:04 (fd:7) (pid:15842) Schedd userB@cluster got all it wants;
removing it.
A's waiting jobs are rejected due to unavailable matches (as expected).
However, B's waiting jobs are rejected due to "insufficient priority". I
don't understand why. I also don't understand how the reported values of
scheddAbsShare, scheddLimit and MaxscheddLimit are computed and what they
mean. Finally, I am suspicious about the "Schedd ... got all it wants"
messages - there is more than one job of each user waiting in the queue,
so why isn't negotiator trying to match all of these jobs?
Best regards,
Jan Ploski
--
Dipl.-Inform. (FH) Jan Ploski
OFFIS
Betriebliches Informationsmanagement
Escherweg 2 - 26121 Oldenburg - Germany
Fon: +49 441 9722 - 184 Fax: +49 441 9722 - 202
E-Mail: Jan.Ploski@xxxxxxxx - URL: http://www.offis.de