On Fri, 29 Jan 2010, Peter Doherty wrote:
This doesn't make much sense to me. I had a large dag, and I noticed a lot of jobs got put on hold, so I did a condor_release on them.This is the end of my dagman.out file It says there was a bad event (I have no idea what the event was) and then it's aborting the dag, but it also says it's continuing the dag.
The errors indicate that DAGMan got some "impossible" event combinations from the node job log files (an EXECUTE event after a TERMINATED event for that job, multiple TERMINATED events for the same job, etc.). If you send the full dagman.out file and the node job log files, I'll take a look and see what I can figure out.
Kent Wenger Condor Team