[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-devel] Bug in Checkpoint when WantCheckpoint is False
- Date: Fri, 1 Apr 2005 13:55:16 -0600
- From: Daniel Forrest <forrest@xxxxxxxxxxxxx>
- Subject: [Condor-devel] Bug in Checkpoint when WantCheckpoint is False
I have found a bug in Checkpoint() that occurs when a checkpoint is
requested for a job that has 'WantCheckpoint = False'.
When such a job is to vacate, it is sent SIGTSTP (checkpoint and exit).
The checkpoint will fail (because CKPT_MODE_ABORT is set), but the
job does not exit. It will then be killed 10 minutes (MaxVacateTime)
later, but this is a waste of time.
The following patch addresses this. Comments?
--- condor_ckpt/image.C.SAVE Fri Feb 25 14:41:59 2005
+++ condor_ckpt/image.C Thu Mar 17 15:44:49 2005
@@ -1668,6 +1668,11 @@
if (mode&CKPT_MODE_ABORT) {
dprintf(D_ALWAYS,
"Checkpoint aborted by shadow request.\n");
+ if (check_sig == SIGTSTP) {
+ dprintf( D_ALWAYS, "Ckpt abort\n" );
+ SetSyscalls( SYS_LOCAL | SYS_UNMAPPED );
+ Suicide();
+ }
// We can't just return here. We need to cleanup
// anything we've done above first.
--
Daniel K. Forrest Laboratory for Molecular and
forrest@xxxxxxxxxxxxx Computational Genomics
(608) 262 - 9479 University of Wisconsin, Madison