HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] KILLING_TIMEOUT in 7.6.0



I found the issue and a fix should be pushed shortly.

Rob

On Fri, 2011-05-06 at 20:43 -0500, Dan Bradley wrote:
> There is now a ticket for this issue:
> 
> https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2142
> 
> On 5/6/11 8:31 PM, Dan Bradley wrote:
> > Hi all,
> >
> > I'm wondering if the change in behavior of preemption in 7.5 was 
> > really intended to be as broad as it is.  Now, the KILL expression is 
> > useless by default, because when a job is preempted, it is hard-killed 
> > as soon as KILLING_TIMEOUT expires.  The default KILLING_TIMEOUT is 
> > just 30s.  We've received complaints in CHTC from users who have 
> > self-checkpointing jobs that cannot save state in this amount of 
> > time.  I will crank the knob higher, but I consider this a workaround, 
> > not a resolution.
> >
> > This new behavior came from the following:
> >
> > https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1198
> >
> > Prior to this, KILLING_TIMEOUT was the timeout applied to the 
> > Preempting/Killing activity.  Now, it effectively applies to 
> > Preempting/Vacating as well.
> >
> > I see no release notes warning admins of this important change.
> >
> > --Dan
> >
> > _______________________________________________
> > Condor-devel mailing list
> > Condor-devel@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-devel
> _______________________________________________
> Condor-devel mailing list
> Condor-devel@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-devel