This is my first week of using DAGMan. I am trying
to understand the output. foo.dagman.out contains the nice summary: 2/17 16:49:14 Number of idle job procs: 6 12/17 16:49:14 Of 40 nodes total: 12/17 16:49:14 Done Pre Queued Post Ready Un-Ready Failed 12/17 16:49:14 === === === === === === === 12/17 16:49:14 28 0 12 0 0 0 0 However condor_q seems to paint a different picture: [ijstokes@abitibi dag6]$ condor_q -constraint DAGManJobId==`cat .lastjobid` | grep -c " R " 7 [ijstokes@abitibi dag6]$ condor_q -constraint DAGManJobId==`cat .lastjobid` | grep -c " I " 4 [ijstokes@abitibi dag6]$ condor_q -constraint DAGManJobId==`cat .lastjobid` | grep -c " H " 3 >From the first, I'm told 6 are Idle (condor_q indicates 4, but this could be asynchrony in the updates), but then how do I distinguish between jobs in the RUNNING state and jobs in the HELD state? The nice summary doesn't (directly) seem to distinguish between RUNNING, HELD, and QUEUED, which seems odd. The condor_q output shows that 3 are HELD, which in a Condor-G world effectively means they've failed and need to be retried. Thanks in advance for help understanding this. Ian -- Ian Stokes-Rees W: http://sbgrid.org ijstokes@xxxxxxxxxxxxxxxxxxx T: +1 617 432-5608 x75 SBGrid, Harvard Medical School F: +1 617 432-5600 |