I recently ran a batch of job, just shy of 4000 in total. When it was done I got this:
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE HOLD TOTAL JOB_IDS
jfisher CMD: ngspice 6/7 22:30 1787 _ _ 9 1800 261.0 ... 262.4
9 jobs; 0 completed, 0 removed, 0 idle, 0 running, 9 held, 0 suspended
Running condor_release restarted the jobs, but then something crashes and the jobs go back to being held.
then:
condor_q -hold
ID OWNER HELD_SINCE HOLD_REASON
261.0 jfisher 6/14 14:03 Error from slot1_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
261.1 jfisher 6/14 14:03 Error from slot2_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
261.2 jfisher 6/14 14:03 Error from slot3_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
261.3 jfisher 6/14 14:03 Error from slot4_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
262.0 jfisher 6/14 14:03 Error from slot5_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
262.1 jfisher 6/14 14:03 Error from slot6_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
262.2 jfisher 6/14 14:03 Error from slot1_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
262.3 jfisher 6/14 14:03 Error from slot2_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
262.4 jfisher 6/14 14:03 Error from slot3_1@xxxxxxxxxxxxx: SHADOW at 192.168.1.206 failed to send fi
Alas the truncation is right where I suspect the information I need is going to be.
Any ideas as to how to find out what those jobs are?
--
Kind regards,
Justin Fisher.