Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Condor DAG feature request
- Date: Mon, 11 Dec 2006 15:50:01 -0600 (CST)
- From: "R. Kent Wenger" <wenger@xxxxxxxxxxx>
- Subject: Re: [Condor-users] Condor DAG feature request
On Mon, 11 Dec 2006, Armen Babikyan wrote:
> I have a feature request of Condor's DAG system, with respect to
> handling nested DAGs:
>
> Suppose I have DAG A that calls many DAG B's, and each DAG B runs three
> programs in it, in the order "alpha, beta, gamma". When gamma fails,
> this causes DAG B to end and generate its own rescue file. DAG B will
> then tell DAG A about its failure, and DAG A will then generate its own
> rescue file, and the job will stop.
>
> I've noticed that in the case of nested DAGs, DAG A's rescue DAG does
> not point to DAG B's *rescue* file, it instead points to DAG B's
> *submit* file, causing all instances of alpha, beta, and gamma to be
> performed again, instead of just gamma.
>
> I have a system where the "beta" stage of a job is very time-consuming,
> and it is possible that a few "gamma" instances may fail. It would be
> nice if DAGMan had the ability to detect whether it was running another
> DAG as a sub-job, or just a regular job. In the case of the former, it
> could intelligently point its own rescue file to the rescue file created
> by the DAG sub-job.
This sounds like a really good idea. I've created an entry for this
feature request in our issue-tracking system.
Kent Wenger
Condor Team