[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-devel] Bug in Checkpoint when WantCheckpoint is False
- Date: Thu, 7 Apr 2005 00:28:44 -0500
- From: Daniel Forrest <forrest@xxxxxxxxxxxxx>
- Subject: Re: [Condor-devel] Bug in Checkpoint when WantCheckpoint is False
Peter,
> Your patch causes uinintended side effects and is rejected in its
> current form.
>
> The 'WantCheckpoint = False' attribute turns a checkpoint request
> is a noop. Your code change would turn that noop into a fast vacate
> automatically killing the job. The side effect becomes far more
> apparent when PERIODIC_CHECKPOINT is set to true and the user job
> (which could legitimately run on a machine to completion--as
> specified by an arbitrary pool policy) receives checkpoint signals
> at regular intervals.
I don't understand why. My patch only takes effect if SIGTSTP
(checkpoint and vacate) was received, it makes no difference when
SIGUSR2 (checkpoint only) is received. Why would PERIODIC_CHECKPOINT
be sending SIGTSTP instead of SIGUSR2?
> Currently, we do not have a method for differentiating between a
> checkpoint signal that is periodic and one received when the job
> enters preempting/vacating.
Then I must be missing something because this is exactly what I
thought SIGUSR2 was used for compared to SIGTSTP.
> So this can be implemented in terms of pool policy. Here is what I
> *think* it appears you truly wanted: "checkpoint normally on
> periodic checkpoints, but simply die if the checkpoint is because
> I'm vacating".
No, I only want it to simply die if that particular job has
'WantCheckpoint = False'. Your solution would prevent all checkpoints
on vacate.
--
Daniel K. Forrest Laboratory for Molecular and
forrest@xxxxxxxxxxxxx Computational Genomics
(608) 262 - 9479 University of Wisconsin, Madison