HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-devel] help with preemption



Dear colleagues:

I have the following configuration:

CountDeNovoJobs = int(( slot1_IsDeNovoJob =?= True ) +( slot2_IsDeNovoJob =?=
True ) +( slot3_IsDeNovoJob =?= True ) +( slot4_IsDeNovoJob =?= True ))
WantsDeNovoJobs = ( CountDeNovoJobs <= 1 )
STARTD_SLOT_ATTRS = $(STARTD_SLOT_ATTRS) IsDeNovoJob
STARTD_JOB_EXPRS = $(STARTD_JOB_EXPRS) IsDeNovoJob
STARTD_ATTRS = $(STARTD_ATTRS) CountDeNovoJobs WantsDeNovoJobs
KILL = ($(KILL)) && ( MY.IsDeNovoJob =!= True )
PREEMPT = ($(PREEMPT)) && ( MY.IsDeNovoJob =!= True )
SUSPEND = ($(SUSPEND)) && ( MY.IsDeNovoJob =!= True )

The idea is that once an "IsDeNovoJob" is running, it should never be killed,
suspended, or preempted. Also, It really would be best if only one
"IsDeNovoJob" job is running at a time on an execute node.

I submitted the following submit file:

executable = queuetest.sh
arguments = hello
universe = vanilla
output = queuetest.$(cluster).$(process).out
error = queuetest.$(cluster).$(process).err
log = queuetest.log
+IsDeNovoJob = True
Requirements = ( WantsDeNovoJobs =?= True )
Rank = 2 - Target.CountDeNovoJobs * 8 - SlotId
queue

I thought the above policy with the indicated submit file would keep my job
from being preempted.  But I get in the log:

000 (931949.000.000) 01/12 21:27:19 Job submitted from host: <192.168.0.149:40701>
...
001 (931949.000.000) 01/12 21:27:22 Job executing on host: <192.168.0.20:37016>
...
004 (931949.000.000) 01/12 23:25:02 Job was evicted.
	(0) Job was not checkpointed.
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
	0  -  Run Bytes Sent By Job
	0  -  Run Bytes Received By Job
...
<rest snipped>

In the StartLog on the execute machine, I have:

1/12 23:25:02 slot1: Preempting claim has correct ClaimId.
1/12 23:25:02 slot1: New claim has sufficient rank, preempting current claim.
1/12 23:25:02 slot1: State change: preempting claim based on user priority
1/12 23:25:02 slot1: State change: claim retirement ended/expired
1/12 23:25:02 slot1: Changing state and activity: Claimed/Busy -> Preempting/Vacating

So can anyone tell me what is wrong with my policy?  It seems like I have read
section 3 of the condor manual a couple dozen times, and I am still at a loss.

Thank you much.

Nathan Panike