I have a situation where I submit a DAG where each node has a PRE and
POST script, there are no parent/child relationships since each node is
independent. The PRE script prepares the data for the node to use, the
POST script post processes the data and marks the status of each node in
a separate database. We have a script that allows our users to cancel
the run (a run may have thousands of nodes and take several hours to
complete). The question is, how can I stop the DAG but have the post
script of each node that has started running be run?
Currently, I put a "KILL" file in the directory the dag is run from,
then the PRE scripts check for this file and exit with a non-zero
result. This keeps other nodes that have already run from being added
into the queue. Then I condor_rm each of the idle and running nodes,
this evicts them and runs their POST scripts (which is what I need). I
then just wait for the DAG to finish. If there are a lot of unrun nodes,
I must wait for all their PRE scripts (that do nothing) to run, which is
a waste and can take a while.
Basically I need to signal dagman to stop running PRE scripts and
submitting nodes, condor_rm all submitted nodes, and run any pending
POST scripts. Anyway to do this?
BTW, I'm running on Windows with 7.4.1.....