Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_status -total, Preempting

Date: Mon, 27 Jan 2014 12:04:02 -0600
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] condor_status -total, Preempting

Hi Daniel -

The below looks really unexpected. Your settings indeed should disablepreemption, assuming you did a successful condor_reconfig after thechanges and they are set at the right host (the PREEMPTION_REQUIREMENTSchange read by the condor_negotiator, and the other settings are read byall the execute hosts running condor_startds). Note that the preferredway to disable preemption on HTCondor v8.0+ is via MaxJobRetirementTime,see


http://research.cs.wisc.edu/htcondor/manual/current/3_5Policy_Configuration.html#SECTION00459500000000000000

But what you have below should work as well.

HTCondor may preempt a job in favor of another job from the same user,but only in the case of a higher startd RANK.


Very strange.

Is the below regularly reproducible, or do you only see it very rarely ?

Note that starting HTCondor v8.1.3, the machine classads will reportsome helpful/insightful attributes regarding preemption; I copied thebelow from the manual at

http://research.cs.wisc.edu/htcondor/manual/latest/12_Appendix_A.html

These statistics were added for just such an occurance, i.e. so adminscan confirm that preemption is disabled. So, if you are running v8.1.3or above, are these statistics below reporting preemptions as occuring?If so, is it reporting user preemptions or rank preemptions? Maybe itis only happening on some specific nodes?


JobPreemptions:

The total number of times a running job has been preempted on thismachine.


JobRankPreemptions:

The total number of times a running job has been preempted on thismachine due to the machine's rank of jobs since the condor_startdstarted running.


JobUserPrioPreemptions:

The total number of times a running job has been preempted on thismachine based on a fair share allocation of the pool since thecondor_startd started running.


RecentJobPreemptions:

The total number of jobs which have been preempted from thismachine in the last twenty minutes.


RecentJobRankPreemptions:

The total number of times a running job has been preempted on thismachine due to the machine's rank of jobs in the last twenty minutes.


RecentJobUserPrio:

The total number of times a running job has been preempted on thismachine based on a fair share allocation of the pool in the last twentyminutes.


regards,
Todd

On 1/27/2014 9:53 AM, Pek Daniel wrote:

Some lines from the StartLog:

01/27/14 16:45:42 slot22: Request accepted.
01/27/14 16:45:42 slot22: Remote owner is xxx
01/27/14 16:45:42 slot22: State change: claiming protocol successful
01/27/14 16:45:42 slot22: Changing state: Unclaimed -> Claimed
01/27/14 16:45:46 slot22: Got activate_claim request from shadow
(xxx.xxx.xxx.xxx)
01/27/14 16:45:46 slot22: Remote job ID is 3920.25
01/27/14 16:45:46 slot22: Got universe "VANILLA" (5) from request classad
01/27/14 16:45:47 slot22: State change: claim-activation protocol successful
01/27/14 16:45:47 slot22: Changing activity: Idle -> Busy
01/27/14 16:45:55 slot22: Preempting claim has correct ClaimId.
01/27/14 16:45:55 slot22: New claim has sufficient rank, preempting
current claim.
01/27/14 16:45:55 slot22: State change: preempting claim based on user priority
01/27/14 16:45:55 slot22: State change: claim retirement ended/expired
01/27/14 16:45:55 slot22: Changing state and activity: Claimed/Busy ->
Preempting/Vacating

2014/1/27 Pek Daniel <pekdaniel@xxxxxxxxx>:

Hi,

I tried my best to turn off preemption completely:
PREEMPT = FALSE
SUSPEND = FALSE
KILL = FALSE
PREEMPTION_REQUIREMENTS = FALSE
NEGOTIATOR_CONSIDER_PREEMPTION = FALSE
RANK = 0

But sometimes during negotiation, I still can see non-zero value in
the Preempting column of the output of condor_status -total.

According to the docs:

``Preempting'': A Condor job is being preempted (possibly via
checkpointing) in order to clear the machine for either a higher
priority job or because the machine owner wants the machine back.

Regarding that I have only one single user and completely identical
jobs, I don't think the preemption would happen because of a higher
priority job. Any idea why is this?

Thanks,
Daniel

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685

Follow-Ups:
- [HTCondor-users] condor_status -total, Preempting
  - From: Pek Daniel

References:
- [HTCondor-users] condor_status -total, Preempting
  - From: Pek Daniel
- Re: [HTCondor-users] condor_status -total, Preempting
  - From: Pek Daniel

Prev by Date: Re: [HTCondor-users] Fedora 20 / Condor 8.1.1 : condor_schedd crashes upon condor_submit.
Next by Date: [HTCondor-users] Requirements option does not work properly on version 8.0.4
Previous by thread: Re: [HTCondor-users] condor_status -total, Preempting
Next by thread: [HTCondor-users] condor_status -total, Preempting
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] condor_status -total, Preempting