On 1/13/12 9:00 AM, Nathan Panike wrote:
Dear colleagues: I have the following configuration: CountDeNovoJobs = int(( slot1_IsDeNovoJob =?= True ) +( slot2_IsDeNovoJob =?= True ) +( slot3_IsDeNovoJob =?= True ) +( slot4_IsDeNovoJob =?= True )) WantsDeNovoJobs = ( CountDeNovoJobs<= 1 ) STARTD_SLOT_ATTRS = $(STARTD_SLOT_ATTRS) IsDeNovoJob STARTD_JOB_EXPRS = $(STARTD_JOB_EXPRS) IsDeNovoJob STARTD_ATTRS = $(STARTD_ATTRS) CountDeNovoJobs WantsDeNovoJobs KILL = ($(KILL))&& ( MY.IsDeNovoJob =!= True ) PREEMPT = ($(PREEMPT))&& ( MY.IsDeNovoJob =!= True ) SUSPEND = ($(SUSPEND))&& ( MY.IsDeNovoJob =!= True )
What is RANK set to?
The idea is that once an "IsDeNovoJob" is running, it should never be killed, suspended, or preempted. Also, It really would be best if only one "IsDeNovoJob" job is running at a time on an execute node. I submitted the following submit file: executable = queuetest.sh arguments = hello universe = vanilla output = queuetest.$(cluster).$(process).out error = queuetest.$(cluster).$(process).err log = queuetest.log +IsDeNovoJob = True Requirements = ( WantsDeNovoJobs =?= True ) Rank = 2 - Target.CountDeNovoJobs * 8 - SlotId queue I thought the above policy with the indicated submit file would keep my job from being preempted. But I get in the log: 000 (931949.000.000) 01/12 21:27:19 Job submitted from host:<192.168.0.149:40701> ... 001 (931949.000.000) 01/12 21:27:22 Job executing on host:<192.168.0.20:37016> ... 004 (931949.000.000) 01/12 23:25:02 Job was evicted. (0) Job was not checkpointed. Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job ... <rest snipped> In the StartLog on the execute machine, I have: 1/12 23:25:02 slot1: Preempting claim has correct ClaimId. 1/12 23:25:02 slot1: New claim has sufficient rank, preempting current claim.
The preemption happened because the new job had higher machine rank than the existing job.
It's a continual source of surprise to users that when PREEMPT is false, jobs can still be preempted. Our terminology sucks.
--Dan
1/12 23:25:02 slot1: State change: preempting claim based on user priority 1/12 23:25:02 slot1: State change: claim retirement ended/expired 1/12 23:25:02 slot1: Changing state and activity: Claimed/Busy -> Preempting/Vacating So can anyone tell me what is wrong with my policy? It seems like I have read section 3 of the condor manual a couple dozen times, and I am still at a loss. Thank you much. Nathan Panike _______________________________________________ Condor-devel mailing list Condor-devel@xxxxxxxxxxx https://lists.cs.wisc.edu/mailman/listinfo/condor-devel