[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-devel] KILLING_TIMEOUT in 7.6.0
- Date: Fri, 06 May 2011 20:31:02 -0500
- From: Dan Bradley <dan@xxxxxxxxxxxx>
- Subject: [Condor-devel] KILLING_TIMEOUT in 7.6.0
Hi all,
I'm wondering if the change in behavior of preemption in 7.5 was really
intended to be as broad as it is. Now, the KILL expression is useless
by default, because when a job is preempted, it is hard-killed as soon
as KILLING_TIMEOUT expires. The default KILLING_TIMEOUT is just 30s.
We've received complaints in CHTC from users who have self-checkpointing
jobs that cannot save state in this amount of time. I will crank the
knob higher, but I consider this a workaround, not a resolution.
This new behavior came from the following:
https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1198
Prior to this, KILLING_TIMEOUT was the timeout applied to the
Preempting/Killing activity. Now, it effectively applies to
Preempting/Vacating as well.
I see no release notes warning admins of this important change.
--Dan