[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] aborting DAG because of bad event



On Fri, 29 Jan 2010, Peter Doherty wrote:

This doesn't make much sense to me. I had a large dag, and I noticed a lot of jobs got put on hold, so I did a condor_release on them.
This is the end of my dagman.out file

It says there was a bad event (I have no idea what the event was)
and then it's aborting the dag, but it also says it's continuing the dag.
The errors indicate that DAGMan got some "impossible" event combinations 
from the node job log files (an EXECUTE event after a TERMINATED event for 
that job, multiple TERMINATED events for the same job, etc.).  If you send 
the full dagman.out file and the node job log files, I'll take a look and 
see what I can figure out.
Kent Wenger
Condor Team