Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Recovering from failures of DAGs within DAGs
- Date: Thu, 22 Dec 2005 18:26:15 +0000 (GMT)
- From: Craig Robinson <Craig.Robinson@xxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Recovering from failures of DAGs within DAGs
Hi Kent,
Thanks for your answer; I'll have a go at this.
Cheers,
Craig.
On Thu, 22 Dec 2005, R. Kent Wenger wrote:
> On Wed, 21 Dec 2005, Craig Robinson wrote:
>
> > We are developing a DAGMan application which will ideally use DAGs
> > within DAGs. We have seen in the Condor documentation that such
> > applications are supported. How are failures of internal DAGs dealt
> > with, and is there any easy way to recover from
> > this?
>
> Expanding on my earlier answer, there's an easy way to get the rescue
> DAGs to work right with retries. In the top-level DAG in my example,
> just have the following as a POST script for the node that is the
> lower-level DAG:
>
> #! /bin/csh -f
> if (-e lower.dag.rescue) then
> mv lower.dag lower.dag.orig
> mv lower.dag.rescue lower.dag
> endif
>
> That way, if the lower-level DAG fails, you'll end up actually retrying
> with the rescue DAG, which will start up from where the first try left
> off (the rescue DAG records which nodes were completed).
>
> Kent Wenger
> Condor Team
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>