Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] 6.8.0 DAGMan assertion causing hold?
- Date: Thu, 7 Sep 2006 10:06:55 -0500 (CDT)
- From: "R. Kent Wenger" <wenger@xxxxxxxxxxx>
- Subject: Re: [Condor-users] 6.8.0 DAGMan assertion causing hold?
Mike,
> Hi. I'm seeing a behavior in DAGMan in 6.8.0 that I never saw in 6.6.10
> (our previously installed version). Every once in a while, a DAG will
> get put on hold automatically, for no apparent reason. After some
> digging in the logs, I see this in the DAG's dagman.out log:
>
> 9/6 12:37:41 BAD EVENT: job (4952.0.0) ended, submit count < 1 (0)
> 9/6 12:37:41 BAD EVENT is warning only
> 9/6 12:37:41 ERROR "Assertion ERROR on (job->_queuedNodeJobProcs >=
> 0)" at line 608 in file dag.C
>
> And this in the submit machine's SchedLog:
>
> 9/6 12:37:41 (pid:14159) (4561.0) Problem parsing user policy for
> job: The UNKNOWN (never set) OnExitRemove expression '' evaluated to
> UNDEFINED. Putting job on hold.
> 9/6 12:37:41 (pid:14159) Job 4561.0 put on hold: The UNKNOWN (never
> set) OnExitRemove expression '' evaluated to UNDEFINED
>
> When the DAG job is released, it seems to continue on just fine. Is
> this a bug in 6.8.0? I can send complete logs for the DAG and the
> submit machine to a developer (~400k) if that'd be helpful.
I don't think we've seen this before.
Could you please send me the complete dagman.out file, and the log file(s)
for all of the node jobs?
Kent Wenger
Condor Team