HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Job Suspension




On Apr 26, 2011, at 10:44 AM, Ian Chesal wrote:

On Tuesday, April 26, 2011 at 9:46 AM, Brian Bockelman wrote:

Hi folks,

I have a few observations about job suspension:
1) Whether or not the job is currently suspended is not recorded in the ClassAd.
2) When a job is transitioned from running to suspended, LastSuspensionTime is updated. However, due to (1), you don't know whether the job has since been un-suspended.
3) The starter updates the remote wall time, but not the suspended time.

I would like to preempt jobs based upon the non-suspended running time. However, it doesn't appear that this is possible in the current setup.

Why is the suspension state not reflected in the job's classad? It seems like a very important thing to note.
Preempt them how? Using the RANK or PREEMPT expressions on the machine or the PREEMPTION_REQUIREMENTS _expression_ at the negotiator?

All of those are evaluated in the context of the machine ad so you can use the machine state and activity attributes of the machine to determine if a job is suspend on the machine or running.

   State == "Claimed" && Activity == "Suspended"

Indicates a job is in the suspended state on the machine.

Technically you only really need to look for Activity == "Suspended" because the state machine should never have that Activity value mixed with any other state.


Not quite: I want to preempt running jobs after they have run for 48 hours.  I want to be able to calculate the committed time minus the suspended time for the running job.

This is definitely not available at the schedd for SYSTEM_PERIODIC_REMOVE because the suspended time is not pushed to the schedd.  Is it perhaps pushed to the startd?  If so, then PREEMPT would work.

Brian

Attachment: smime.p7s
Description: S/MIME cryptographic signature