Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Condor DAG feature request
- Date: Mon, 11 Dec 2006 12:50:14 -0500
- From: Armen Babikyan <armenb@xxxxxxxxxx>
- Subject: [Condor-users] Condor DAG feature request
Hello,
I have a feature request of Condor's DAG system, with respect to
handling nested DAGs:
Suppose I have DAG A that calls many DAG B's, and each DAG B runs three
programs in it, in the order "alpha, beta, gamma". When gamma fails,
this causes DAG B to end and generate its own rescue file. DAG B will
then tell DAG A about its failure, and DAG A will then generate its own
rescue file, and the job will stop.
I've noticed that in the case of nested DAGs, DAG A's rescue DAG does
not point to DAG B's *rescue* file, it instead points to DAG B's
*submit* file, causing all instances of alpha, beta, and gamma to be
performed again, instead of just gamma.
I have a system where the "beta" stage of a job is very time-consuming,
and it is possible that a few "gamma" instances may fail. It would be
nice if DAGMan had the ability to detect whether it was running another
DAG as a sub-job, or just a regular job. In the case of the former, it
could intelligently point its own rescue file to the rescue file created
by the DAG sub-job.
Thanks,
- Armen
--
Armen Babikyan
MIT Lincoln Laboratory
armenb@xxxxxxxxxx . 781-981-1796