HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Bug in Checkpoint when WantCheckpoint is False



Hello,

I'll take a closer look at this patch's context on monday and see if
I can put it into the enxte developer's release.

Thank you.

-Pete


On Fri, Apr 01, 2005 at 01:55:16PM -0600, Daniel Forrest wrote:
> I have found a bug in Checkpoint() that occurs when a checkpoint is
> requested for a job that has 'WantCheckpoint = False'.
> 
> When such a job is to vacate, it is sent SIGTSTP (checkpoint and exit).
> The checkpoint will fail (because CKPT_MODE_ABORT is set), but the
> job does not exit.  It will then be killed 10 minutes (MaxVacateTime)
> later, but this is a waste of time.
> 
> The following patch addresses this.  Comments?
> 
> --- condor_ckpt/image.C.SAVE	Fri Feb 25 14:41:59 2005
> +++ condor_ckpt/image.C	Thu Mar 17 15:44:49 2005
> @@ -1668,6 +1668,11 @@
>  				if (mode&CKPT_MODE_ABORT) {
>  					dprintf(D_ALWAYS,
>  							"Checkpoint aborted by shadow request.\n");
> +					if (check_sig == SIGTSTP) {
> +						dprintf( D_ALWAYS,  "Ckpt abort\n" );
> +						SetSyscalls( SYS_LOCAL | SYS_UNMAPPED );
> +						Suicide();
> +					}
>  
>  					// We can't just return here.  We need to cleanup
>  					// anything we've done above first.
> 
> -- 
> Daniel K. Forrest	Laboratory for Molecular and
> forrest@xxxxxxxxxxxxx	Computational Genomics
> (608) 262 - 9479	University of Wisconsin, Madison
> _______________________________________________
> Condor-devel mailing list
> Condor-devel@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-devel