On Mon, May 10, 2010 at 17:19, R. Kent Wenger <wenger@xxxxxxxxxxx> wrote:
On Mon, 10 May 2010, Alexander Dietz wrote:
does anyone have new information on my reported problem? I need to
finish this DAG soon, so without any reply soon I have to restart the
DAG from scratch (and will not be able to make tests regarding my
reported problem).
I haven't figured out yet exactly what happened. But here's one thing to
try that's better that starting over from scratch: if there's a lock file
(t.lock) remove that, and
re-submit the DAG. (I'm assuming that the DAGMan job is no longer in the
queue.) That should run the rescue DAG, so you won't be starting from
scratch, but it won't go into recovery mode, so you'll bypass the problems
with events that are goofing things up.
I guess this procedure kind of works. Maybe the DAG continued not
exactly where it was, but at least from the rescue-DAG level.