[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Can a DAGMan node be an already running job?



Let's say I have the following two DAGs

A -> B

A -> C

Where A is a long running job. There's an A job running because someone has invoked the A -> B workflow. Now someone comes along and wants to invoke the A -> C workflow. I don't want to re-run A, but I can't mark it as DONE in the DAGfile because then C will start running immediately.

As far as I know, there's no particular time limit on how long a DAG node's PRE script can take, so if you know which job(s) or cluster(s) you're waiting for, you can wait for them in the PRE script. That has two advantage: first, it's not stepping outside of DAGMan, meaning that as the workflow(s) involved get more complicated, you can continue to take advantage of (rather than reimplement) its features; second, it doesn't use an execute node while it's waiting.

Of course, you still have to skip the job, but that's what the "skip_if_dataflow" submit command is for. (There are other ways to skip the job from the PRE script, but this one has the advantage of having HTCondor verify that the two A nodes were, in fact, identical.)

-- ToddM