Hello, On 11/22/2010 02:43 AM, Carsten Aulbert wrote:
we are currently seeing a large number of shadows dying due to connection time outs. These are almost certainly caused by our network having a couple of issues right now, however, is there any setting we can tell Condor or the Linux kernel to mitigate this issue a bit as a short time solution before we can weed out the networking problems at its root?
I believe setting "JobLeaseDuration" in your condor_config is what you might want.
Example of a 24-hour job lease duration: JobLeaseDuration = 86400 From the manual:JobLeaseDuration - The number of seconds set for a job lease, the amount of time that a job may continue running on a remote resource, despite its submitting machine’s lack of response. See section 2.14.4 for details on job leases.
A link to 2.14.4: http://www.cs.wisc.edu/condor/manual/v7.5/2_14Special_Environment.html#SECTION003144000000000000000 -Mick