Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Gracefully stopping DAGMAN
- Date: Fri, 26 Mar 2010 12:46:07 -0500
- From: Craig Struble <craig.struble@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Gracefully stopping DAGMAN
On Mar 26, 2010, at 11:36 AM, R. Kent Wenger wrote:
On Thu, 25 Mar 2010, Robert Mortensen wrote:
I have a situation where I submit a DAG where each node has a PRE
and POST script, there are no parent/child relationships since each
node is independent. The PRE script prepares the data for the node
to use, the POST script post processes the data and marks the
status of each node in a separate database. We have a script that
allows our users to cancel the run (a run may have thousands of
nodes and take several hours to complete). The question is, how can
I stop the DAG but have the post script of each node that has
started running be run?
Currently, I put a "KILL" file in the directory the dag is run
from, then the PRE scripts check for this file and exit with a non-
zero result. This keeps other nodes that have already run from
being added into the queue. Then I condor_rm each of the idle and
running nodes, this evicts them and runs their POST scripts (which
is what I need). I then just wait for the DAG to finish. If there
are a lot of unrun nodes, I must wait for all their PRE scripts
(that do nothing) to run, which is a waste and can take a while.
Basically I need to signal dagman to stop running PRE scripts and
submitting nodes, condor_rm all submitted nodes, and run any
pending POST scripts. Anyway to do this?
BTW, I'm running on Windows with 7.4.1.....
Hmm. I can't think of a fairly easy way to do exactly what you want
to do. If you condor_rm the DAGMan job, it will rm all of the node
jobs, but it won't run any of the POST scripts.
I'm thinking that the real solution to this problem is to add a
configuration knob to tell DAGMan exactly what you want it to do
when you condor_rm it -- so you could tell it, for example, to
remove jobs in the queue, but still go ahead and run the POST
scripts. How does that sound?
I like this idea. I recently developed a workflow that stages and
unstages data to a web server. During the development, it would have
been very handy to have a "do this on condor_rm" knob so that
unstaging would occur when I stopped my DAG prematurely.
Craig
Kent Wenger
Condor Team
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
--
Craig A. Struble, Ph.D. | 369 Cudahy Hall | Marquette University
Associate Professor of Computer Science | (414)288-3783
Director, Master of Bioinformatics Program | (414)288-5472 (fax)
http://www.mscs.mu.edu/~cstruble | craig.struble@xxxxxxxxxxxxx