Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Implementing checkpointing via job wrappers
- Date: Mon, 12 Aug 2013 09:45:55 -0500
- From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Implementing checkpointing via job wrappers
Hi Max,
This may be a relevant link for you:
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=DmtcpCondor
Even if you don't use the code directly, there's a good chance that it'll give an example for all the things you want to do.
Enjoy!
Brian
On Aug 12, 2013, at 9:38 AM, Max Fischer <max.fischer@xxxxxxx> wrote:
> Hello Condor Users,
>
> we're currently looking into expanding our HTCondor setup to include desktop resources (was previously just glideins and dedicated worker nodes) so I'm investigating if/how to best supply checkpointing capabilities. Problem is that our user's workflows depend heavily on shell scripts for flow control and organisational tasks. Is there a suggested procedure to handle such jobs with preempting?
>
> Practically all jobs are run by our own job submission tool, so we can modify its wrapper layer (implemented as a shell script). I was thinking about issuing standalone checkpoints [1] and restoring from checkpoint files if any are present on startup. How must the HTCondor job be setup to fetch these manual checkpoints on eviction and transfer them on restart?
>
> Are there any guides, hints or tutorials for using external checkpointing such as BLCR?
>
> Cheers,
> Max
>
> [1]
> http://research.cs.wisc.edu/htcondor/manual/v7.8/4_2HTCondor_s_Checkpoint.html#sec:standalone-ckpt
>
> [2]
> https://ftg.lbl.gov/projects/CheckpointRestart/
>
> --
> Dipl.-Phys. Max Fischer
> Karlsruhe Institute of Technology (KIT)
> Steinbuch Centre for Computing (SCC)
> Institute of Experimental nuclear Physics (IEKP)
> email: max.fischer@xxxxxxx
> phone: +49 721 608 28328 (SCC)
> +49 721 608 43369 (IEKP)
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/