[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] how to resrict job run time




I recommend the following expression for removing jobs that run for too long:

maxRunTime = 60
periodic_remove = JobStatus == 2 && \
 ( (CurrentTime - EnteredCurrentStatus) + \
   RemoteWallClockTime - CumulativeSuspensionTime < $(maxRunTime) )

Warning: that may be useful, but it is not quite "correct" for several reasons:

1. It does not take into account suspension time in the current run attempt, only previous run attempts.

2. It does not take into account whether previous run attempts were preempted without saving a checkpoint.

Given the attributes that are published into the job ClassAd, I see no way to correct those cases in current versions of Condor.

--Dan

Jens Harting wrote:

On Wed, 25 Oct 2006, Dan Bradley wrote:

Use periodic_remove or periodic_hold in the job submission file. See the condor_submit manual for details.

--Dan

I did try that, but found a problem:

Adding the following lines to my submission script do work:

maxRunTime               = 60
periodic_remove = (RemoteWallClockTime > $(maxRunTime))

However, the value for RemoteWallClockTime will only be updated if the job is being suspended. Since it sometimes happens that a job runs on a machine without being preempted much longer than $(maxRunTime), the expression for periodic_remove cannot evaluate to true.

Is there any way to tell Condor to update RemoteWallClockTime more frequently?

jens
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR