[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] 6.8.0 DAGMan assertion causing hold?



Hi.  I'm seeing a behavior in DAGMan in 6.8.0 that I never saw in 6.6.10 
(our previously installed version).  Every once in a while, a DAG will 
get put on hold automatically, for no apparent reason.  After some 
digging in the logs, I see this in the DAG's dagman.out log:
   9/6 12:37:41 BAD EVENT: job (4952.0.0) ended, submit count < 1 (0)
   9/6 12:37:41 BAD EVENT is warning only
9/6 12:37:41 ERROR "Assertion ERROR on (job->_queuedNodeJobProcs >= 0)" at line 608 in file dag.C
And this in the submit machine's SchedLog:

9/6 12:37:41 (pid:14159) (4561.0) Problem parsing user policy for job: The UNKNOWN (never set) OnExitRemove expression '' evaluated to UNDEFINED. Putting job on hold. 9/6 12:37:41 (pid:14159) Job 4561.0 put on hold: The UNKNOWN (never set) OnExitRemove expression '' evaluated to UNDEFINED
When the DAG job is released, it seems to continue on just fine.  Is 
this a bug in 6.8.0?  I can send complete logs for the DAG and the 
submit machine to a developer (~400k) if that'd be helpful.
-Mike