Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Recovering from failures of DAGs within DAGs

Date: Thu, 22 Dec 2005 18:26:15 +0000 (GMT)
From: Craig Robinson <Craig.Robinson@xxxxxxxxxxxxxx>
Subject: Re: [Condor-users] Recovering from failures of DAGs within DAGs

Hi Kent,

Thanks for your answer; I'll have a go at this.

Cheers,

Craig.

On Thu, 22 Dec 2005, R. Kent Wenger wrote:

> On Wed, 21 Dec 2005, Craig Robinson wrote:
>
> > We are developing a DAGMan application which will ideally use DAGs
> > within DAGs. We have seen in the Condor documentation that such
> > applications are supported. How are failures of internal DAGs dealt
> > with, and is there any easy way to recover from
> > this?
>
> Expanding on my earlier answer, there's an easy way to get the rescue
> DAGs to work right with retries.  In the top-level DAG in my example,
> just have the following as a POST script for the node that is the
> lower-level DAG:
>
>     #! /bin/csh -f
>     if (-e lower.dag.rescue) then
>       mv lower.dag lower.dag.orig
>       mv lower.dag.rescue lower.dag
>     endif
>
> That way, if the lower-level DAG fails, you'll end up actually retrying
> with the rescue DAG, which will start up from where the first try left
> off (the rescue DAG records which nodes were completed).
>
> Kent Wenger
> Condor Team
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>

References:
- [Condor-users] Recovering from failures of DAGs within DAGs
  - From: Craig Robinson
- Re: [Condor-users] Recovering from failures of DAGs within DAGs
  - From: R. Kent Wenger

Prev by Date: Re: [Condor-users] BOINC running, all machine Owner
Next by Date: Re: [Condor-users] condor-administrators
Previous by thread: Re: [Condor-users] Recovering from failures of DAGs within DAGs
Next by thread: [Condor-users] Condor and Windows Domain Accounts
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [Condor-users] Recovering from failures of DAGs within DAGs