Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] aborting DAG because of bad event
- Date: Fri, 29 Jan 2010 11:56:59 -0500
- From: Peter Doherty <doherty@xxxxxxxxxxxxxxxxxxx>
- Subject: [Condor-users] aborting DAG because of bad event
This doesn't make much sense to me. I had a large dag, and I noticed
a lot of jobs got put on hold, so I did a condor_release on them.
This is the end of my dagman.out file
It says there was a bad event (I have no idea what the event was)
and then it's aborting the dag, but it also says it's continuing the
dag.
What just happened?
Thanks
-Peter
01/29 09:05:33 Currently monitoring 1 Condor log file(s)
01/29 09:05:34 Currently monitoring 1 Condor log file(s)
01/29 09:05:34 BAD EVENT: job (377972.0.0) executing, total end count !
= 0 (1)
01/29 09:05:34 ERROR: aborting DAG because of bad event (BAD EVENT:
job (377972.0.0) executing, total end count != 0 (1))
01/29 09:05:34 BAD EVENT: job (377972.0.0) ended, total end count != 1
(2)
01/29 09:05:34 Continuing with DAG in spite of bad event (BAD EVENT:
job (377972.0.0) ended, total end count != 1 (2)) because of
allow_events setting
01/29 09:05:34 BAD EVENT: job (376465.0.0) executing, total end count !
= 0 (1)
01/29 09:05:34 ERROR: aborting DAG because of bad event (BAD EVENT:
job (376465.0.0) executing, total end count != 0 (1))
01/29 09:05:34 BAD EVENT: job (376465.0.0) ended, total end count != 1
(2)
01/29 09:05:34 Continuing with DAG in spite of bad event (BAD EVENT:
job (376465.0.0) ended, total end count != 1 (2)) because of
allow_events setting
01/29 09:05:34 Aborting DAG...
01/29 09:05:35 Writing Rescue DAG to ../.dag/2vpw-
g10.dag.rescue001.rescue001...
01/29 09:05:35 Removing submitted jobs...
01/29 09:05:35 Removing any/all submitted Condor/Stork jobs...
01/29 09:05:36 Note: 663691422 total job deferrals because of -MaxJobs
limit (4000)
01/29 09:05:36 Note: 0 total job deferrals because of -MaxIdle limit (0)
01/29 09:05:36 Note: 0 total job deferrals because of node category
throttles
01/29 09:05:36 Note: 0 total PRE script deferrals because of -MaxPre
limit (0)
01/29 09:05:36 Note: 0 total POST script deferrals because of -MaxPost
limit (0)
01/29 09:05:36 Warning: ReadMultipleUserLogs destructor called, but
still monitoring 1 log(s)!
01/29 09:05:36 **** condor_scheduniv_exec.376286.0 (condor_DAGMAN) pid
10470 EXITING WITH STATUS 1