[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-devel] KILLING_TIMEOUT in 7.6.0
- Date: Tue, 10 May 2011 09:11:26 -0500
- From: Rob Rati <rrati@xxxxxxxxxx>
- Subject: Re: [Condor-devel] KILLING_TIMEOUT in 7.6.0
I found the issue and a fix should be pushed shortly.
Rob
On Fri, 2011-05-06 at 20:43 -0500, Dan Bradley wrote:
> There is now a ticket for this issue:
>
> https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2142
>
> On 5/6/11 8:31 PM, Dan Bradley wrote:
> > Hi all,
> >
> > I'm wondering if the change in behavior of preemption in 7.5 was
> > really intended to be as broad as it is. Now, the KILL expression is
> > useless by default, because when a job is preempted, it is hard-killed
> > as soon as KILLING_TIMEOUT expires. The default KILLING_TIMEOUT is
> > just 30s. We've received complaints in CHTC from users who have
> > self-checkpointing jobs that cannot save state in this amount of
> > time. I will crank the knob higher, but I consider this a workaround,
> > not a resolution.
> >
> > This new behavior came from the following:
> >
> > https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1198
> >
> > Prior to this, KILLING_TIMEOUT was the timeout applied to the
> > Preempting/Killing activity. Now, it effectively applies to
> > Preempting/Vacating as well.
> >
> > I see no release notes warning admins of this important change.
> >
> > --Dan
> >
> > _______________________________________________
> > Condor-devel mailing list
> > Condor-devel@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-devel
> _______________________________________________
> Condor-devel mailing list
> Condor-devel@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-devel