HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-devel] Bug in Checkpoint when WantCheckpoint is False



I have found a bug in Checkpoint() that occurs when a checkpoint is
requested for a job that has 'WantCheckpoint = False'.

When such a job is to vacate, it is sent SIGTSTP (checkpoint and exit).
The checkpoint will fail (because CKPT_MODE_ABORT is set), but the
job does not exit.  It will then be killed 10 minutes (MaxVacateTime)
later, but this is a waste of time.

The following patch addresses this.  Comments?

--- condor_ckpt/image.C.SAVE	Fri Feb 25 14:41:59 2005
+++ condor_ckpt/image.C	Thu Mar 17 15:44:49 2005
@@ -1668,6 +1668,11 @@
 				if (mode&CKPT_MODE_ABORT) {
 					dprintf(D_ALWAYS,
 							"Checkpoint aborted by shadow request.\n");
+					if (check_sig == SIGTSTP) {
+						dprintf( D_ALWAYS,  "Ckpt abort\n" );
+						SetSyscalls( SYS_LOCAL | SYS_UNMAPPED );
+						Suicide();
+					}
 
 					// We can't just return here.  We need to cleanup
 					// anything we've done above first.

-- 
Daniel K. Forrest	Laboratory for Molecular and
forrest@xxxxxxxxxxxxx	Computational Genomics
(608) 262 - 9479	University of Wisconsin, Madison