Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] DAGMan Hangs Near End
- Date: Fri, 28 Sep 2012 13:27:10 -0500
- From: Oren Livne <livne@xxxxxxxxxxxx>
- Subject: [Condor-users] DAGMan Hangs Near End
Dear All,
I have a DAGMan pipeline that starts fine, but never completes, because
the last few jobs are queued but never run. A down-scaled version of it
works, so I doubt that it's a programming error. There are many
available nodes; why won't those jobs run? How can I analyze the
individual job within the DAGMan that says "Queued"?
Thank you so much,
Oren
-- Submitter: ibicluster.uchicago.cc : <172.16.0.149:42470> :
ibicluster.uchicago.cc
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
904.0 livne 9/28 13:09 0+00:15:40 R 0 7.3
condor_dagman -f -
1 jobs; 0 idle, 1 running, 0 held
===================================================================================
Total Owner Claimed Unclaimed Matched Preempting
Backfill
X86_64/LINUX 728 108 0 620 0 0 0
Total 728 108 0 620 0 0 0
===================================================================================
9/28 13:23:33 Event: ULOG_EXECUTE for Condor Node D_chr10 (1009.0)
9/28 13:23:33 Number of idle job procs: 1
9/28 13:23:43 Event: ULOG_JOB_TERMINATED for Condor Node D_chr10 (1009.0)
9/28 13:23:43 Node D_chr10 job proc (1009.0) completed successfully.
9/28 13:23:43 Node D_chr10 job completed
9/28 13:23:43 Number of idle job procs: 1
9/28 13:23:43 Of 107 nodes total:
9/28 13:23:43 Done Pre Queued Post Ready Un-Ready Failed
9/28 13:23:43 === === === === === === ===
9/28 13:23:43 104 0 1 0 0 2 0
--
A person is just about as big as the things that make him angry.