Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] quick question: is periodic vacate possible
- Date: Thu, 17 Jun 2010 15:50:51 +0100
- From: "Smith, Ian" <I.C.Smith@xxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] quick question: is periodic vacate possible
I'm using my own checkpointing mechanism which is written into the
code. The code (a R script) saves its workspace to file periodically
and this gets written to $(SPOOL ) when the job is evicted. When
the job restarts, the workspace is restored.
regards,
-ian.
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-
> bounces@xxxxxxxxxxx] On Behalf Of Burnett, Ben
> Sent: 17 June 2010 15:05
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] quick question: is periodic vacate possible
>
> Are you using Condor's checkpointing mechanism, or your own? If it's Condor's, then
> PERIODIC_CHECKPOINT will do the trick
> (http://www.cs.wisc.edu/condor/manual/v7.5/7_2Setting_up.html#47702); otherwise,
> how is your executable told to write it's checkpoint file out? Via a signal?
>
> -B
>
> On 2010-06-17, at 4:35 AM, Smith, Ian wrote:
>
> > Dear All,
> >
> > Just a very quick question that I can't seem to find an answer for
> > anywhere:
> >
> > Is it possible to periodically vacate jobs in the same way as
> > they can be periodically held and removed ?
> >
> > The reason I ask is that I've been building checkpointing
> > into some of our vanilla universe jobs and it would
> > be useful if these could be vacated say once every
> > few hours so that the checkpoint file get stored in
> > the $(SPOOL). Some of the jobs can run for days
> > and with few students around the campus at present
> > they are unlikely to get evicted by user logins. This
> > means that the output can get lost if the startd
> > crashes for some reason*, loosing several days
> > work.
> >
> > regards,
> >
> > -ian.
> >
> > * I've noticed several connection failures with long running jobs
> > and I'm still not sure of the reason although someone turning
> > off an execute host running a job is obviously one !
> >
> > --------------------------------------------
> > Dr Ian C. Smith,
> > Advanced Research Computing (e-Science) Team,
> > The University of Liverpool
> > Computing Services Department
> >
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/