Hi all,
I'm wondering if the change in behavior of preemption in 7.5 was
really intended to be as broad as it is. Now, the KILL expression is
useless by default, because when a job is preempted, it is hard-killed
as soon as KILLING_TIMEOUT expires. The default KILLING_TIMEOUT is
just 30s. We've received complaints in CHTC from users who have
self-checkpointing jobs that cannot save state in this amount of
time. I will crank the knob higher, but I consider this a workaround,
not a resolution.
This new behavior came from the following:
https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1198
Prior to this, KILLING_TIMEOUT was the timeout applied to the
Preempting/Killing activity. Now, it effectively applies to
Preempting/Vacating as well.
I see no release notes warning admins of this important change.
--Dan
_______________________________________________
Condor-devel mailing list
Condor-devel@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-devel