Mailing List Archives
	Authenticated access
	
	
     | 
    
	 
	 
     | 
    
	
	 
     | 
  
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Recovering from failures of DAGs within DAGs
- Date: Thu, 22 Dec 2005 18:26:15 +0000 (GMT)
 
- From: Craig Robinson <Craig.Robinson@xxxxxxxxxxxxxx>
 
- Subject: Re: [Condor-users] Recovering from failures of DAGs within DAGs
 
Hi Kent,
Thanks for your answer; I'll have a go at this.
Cheers,
Craig.
On Thu, 22 Dec 2005, R. Kent Wenger wrote:
> On Wed, 21 Dec 2005, Craig Robinson wrote:
>
> > We are developing a DAGMan application which will ideally use DAGs
> > within DAGs. We have seen in the Condor documentation that such
> > applications are supported. How are failures of internal DAGs dealt
> > with, and is there any easy way to recover from
> > this?
>
> Expanding on my earlier answer, there's an easy way to get the rescue
> DAGs to work right with retries.  In the top-level DAG in my example,
> just have the following as a POST script for the node that is the
> lower-level DAG:
>
>     #! /bin/csh -f
>     if (-e lower.dag.rescue) then
>       mv lower.dag lower.dag.orig
>       mv lower.dag.rescue lower.dag
>     endif
>
> That way, if the lower-level DAG fails, you'll end up actually retrying
> with the rescue DAG, which will start up from where the first try left
> off (the rescue DAG records which nodes were completed).
>
> Kent Wenger
> Condor Team
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>