Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Marking child as DONE
- Date: Fri, 14 Mar 2008 18:31:47 -0700
- From: Nickolas Fotopoulos <nvf@xxxxxxxxxxxxxxxxxxxx>
- Subject: [Condor-users] Marking child as DONE
Dear all,
After a DAG has run partway through, I've decided that the bottom-most
post-processing job (several thousand of them) should/can not be run.
When my rescue DAG comes, as it inevitably does, I would like not to
execute these. So far, no problem; a one-line bash/sed invocation
takes care of that:
cat $f | sed 's/.*mysubfile.*/& DONE/' > ${f}.sires_done;
The problem is that not all of the parents have completed
successfully. I'd like to resubmit the parents, but not these
children. When I naively mark them as DONE, as above, I get the
following error while dagman parses the DAG.
3/13 20:25:13 ERROR: AddParent( ea0bca7d3503cccca43dff66a99c1516 )
failed for no
de a5bf08f49f3323fdd5f838f6d89918f7: STATUS_DONE child may not be
given a n
ew STATUS_READY parent
Removing the JOB lines produces an error that the parent-child
relationships refer to a non-existent job. (I don't have the exact
message handy.)
I see a few solutions, none of which I like:
* resubmit without modification and let the children fail (wastes
resources)
* change the submit files to point to /bin/true and run in the local
universe (a lot of scheduling overhead, I'd think, but maybe this is
negligible)
* identify all nodes of a class and remove all references to each of
them (more code than I want to write at the moment)
Can I get some gut reactions to these options or perhaps new, cleverer
options?
Thanks,
Nick
===================================
Nickolas Fotopoulos
nvf@xxxxxxxxxxxxxxxxxxxx
Office: (414) 229-6438
Fax: (414) 229-5589
University of Wisconsin - Milwaukee
Physics Bldg, Rm 471
===================================