One option is to set JOB_DEFAULT_LEASE_DURATION in the configuration files on the submitting machines. The default is 2400 seconds (40 minutes). This controls how long the submitter and executor will attempt to reconnect before aborting a job execution. The
downside to lowering this value is that you risk killing jobs in situations where an interruption is temporary. For example, when upgrading HTCondor or rebooting on the submit machine.
- Jaime
|