I have a condor pool where most of the machines are set to
never pre-empt. I thought that this setting would mean that
pre-emption doesn't happen but it appears I am wrong.
On 15 of my machines I have the following settings
(and condor_config_val acknowledges they are seen both
by the startd on the machine and by the negotiator/collector).
[root@fnpc182 log]# condor_config_val -startd PREEMPT
FALSE
[root@fnpc182 log]# condor_config_val -startd PREEMPTION_REQUIREMENTS
FALSE
[root@fnpc182 log]# condor_config_val -startd START
TRUE
[root@fnpc182 log]# condor_config_val -startd RANK
(agroup == "group_numi" ) * 1000
What I want to happen is to give this machine priority of
starting jobs from group_numi, (agroup is a group attribute that
I set in the classads of all jobs). But I don't want it to
pre-empt an existing job of some other group if that job is
not yet finished yet.
What is actually happening is the following:
From StartLog
3/23 13:22:56 DaemonCore: Command received via UDP from host
<131.225.167.42:198
21>
3/23 13:22:56 DaemonCore: received command 440 (MATCH_INFO), calling
handler (co
mmand_match_info)
3/23 13:22:56 vm1: match_info called
3/23 13:22:56 DaemonCore: Command received via UDP from host
<131.225.167.42:198
21>
3/23 13:22:56 DaemonCore: received command 440 (MATCH_INFO), calling
handler (co
mmand_match_info)
3/23 13:22:56 vm2: match_info called
3/23 13:22:56 DaemonCore: Command received via TCP from host
<131.225.167.42:307
85>
3/23 13:22:56 DaemonCore: received command 442 (REQUEST_CLAIM), calling
handler
(command_request_claim)
3/23 13:22:56 vm1: Preempting claim has correct ClaimId.
3/23 13:22:56 vm1: New claim has sufficient rank, preempting current
claim.
3/23 13:22:56 vm1: State change: preempting claim based on machine rank
3/23 13:22:56 vm1: State change: retiring due to preempting claim
3/23 13:22:56 vm1: Changing activity: Busy -> Retiring
3/23 13:22:56 vm1: State change: retirement ended/expired
3/23 13:22:56 vm1: Changing state and activity: Claimed/Retiring ->
Preempting/V
acating
3/23 13:22:56 DaemonCore: Command received via TCP from host
<131.225.167.42:307
86>
3/23 13:22:56 DaemonCore: received command 442 (REQUEST_CLAIM), calling
handler
(command_request_claim)
3/23 13:22:56 vm2: Preempting claim has correct ClaimId.
3/23 13:22:56 vm2: New claim has sufficient rank, preempting current
claim.
3/23 13:22:56 vm2: State change: preempting claim based on machine rank
3/23 13:22:56 vm2: State change: retiring due to preempting claim
3/23 13:22:56 vm2: Changing activity: Busy -> Retiring
3/23 13:22:56 vm2: State change: retirement ended/expired
3/23 13:22:56 vm2: Changing state and activity: Claimed/Retiring ->
Preempting/V
acating
3/23 13:22:56 DaemonCore: Command received via TCP from host
<131.225.167.42:307
94>
3/23 13:22:56 DaemonCore: received command 404
(DEACTIVATE_CLAIM_FORCIBLY), call
ing handler (command_handler)
3/23 13:22:56 vm1: Got KILL_FRGN_JOB while in Preempting state, ignoring.
3/23 13:22:56 Starter pid 4093 exited with status 0
3/23 13:22:56 vm1: State change: preempting claim exists - START is true
or unde
fined
3/23 13:22:56 vm1: Remote owner is rubin@xxxxxxxx
3/23 13:22:56 vm1: State change: claiming protocol successful
3/23 13:22:56 vm1: Changing state and activity: Preempting/Vacating ->
Claimed/I
dle
3/23 13:22:56 DaemonCore: Command received via TCP from host
<131.225.167.42:307
96>
3/23 13:22:56 DaemonCore: received command 404
(DEACTIVATE_CLAIM_FORCIBLY), call
ing handler (command_handler)
3/23 13:22:56 vm2: Got KILL_FRGN_JOB while in Preempting state, ignoring.
3/23 13:22:56 DaemonCore: Command received via UDP from host
<131.225.167.42:198
49>
3/23 13:22:56 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler
(command_release_claim)
3/23 13:22:56 Warning: can't find resource with ClaimId
(<131.225.167.182:22866>
#1142441053#75)
3/23 13:22:57 DaemonCore: Command received via UDP from host
<131.225.167.42:198
49>
3/23 13:22:57 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler
(command_release_claim)
3/23 13:22:57 vm2: Got RELEASE_CLAIM while in Preempting state, ignoring.
3/23 13:22:57 DaemonCore: Command received via UDP from host
<131.225.167.42:198
49>
3/23 13:22:57 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler
(command_release_claim)
3/23 13:22:57 vm2: Got RELEASE_CLAIM while in Preempting state, ignoring.
3/23 13:23:01 DaemonCore: Command received via TCP from host
<131.225.167.42:308
56>
3/23 13:23:01 DaemonCore: received command 444 (ACTIVATE_CLAIM), calling
handler
(command_activate_claim)
\
and in NegotiatorLog it indicated that there was indeed
a job from a user in group_numi, with priority 16, who pre-empted
the existing job from a user not in group_numi, at the time had a priority
of 160.
How do we beat this, is there any way to give preference for
starting jobs without having pre-emption go on?
Steve Timm