[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] quick questions





Robert E. Parrott wrote:

A couple of quick one-offs on configs:

1) How does a user specify a max runtime on a job from their submit
file?


What do you want to achieve: putting the job on hold if it runs for too long? Or simply specifying the maximum amount of time the job should be
given to finish before being preempted by higher priority jobs?

Here, I'd like to have users be able to specify the max total run time for a parallel job before it's ended.
Is this different from putting the job on hold if it runs too long? I'm not aware of any other option specific to the parallel job universe.

But I would be very interested in the answers to the other cases you pose as well. I assume for the first you want to use a PERIODIC_HOLD expression, but the second would be useful as well.
Yes, periodic_hold in the job submit file can be used to put a job on hold if it runs too long. An alternative would be to have users insert a custom attribute that specifies maximum runtime and then you would use SYSTEM_PERIODIC_HOLD in the config file to put jobs on hold that run longer than expected. Example:

in submit file:
+MaxRunTime = 3600

in config file:
SYSTEM_PERIODIC_HOLD = JobStatus == 2 && MaxRunTime =!= UNDEFINED && (RemoteWallClockTime - CumulativeSuspensionTime) > MaxRunTime


The other thing I alluded to was a way to specify the amount of time a job should be allowed to run without interruption. This doesn't really apply to the parallel universe, because parallel universe jobs should always run without preemption.

in submit file:
# this should finish in less than one hour
# if it does not, it is ok for it to be preempted
MaxJobRetirementTime = 3600

in execute machine config file:
# allow up to 2 days max of uninterrupted time for jobs
MaxJobRetirementTime = 3600*24*2

I hope that helps you.

--Dan