On Thu, 5 Sep 2013, 钱晓明 wrote:
I found that to condor_suspend a dagman job can make it crashed and get into RECOVERY mode. This is the output for dagman when issue suspend command:
It doesn't surprise me that that happens. I did some testing with 8.0, and I'm not seeing the exact same behavior. But I'm not sure what condor_suspend is supposed to do to a scheduler universe job (which DAGMan is, unless you've changed the normal .condor.sub file generated by condor_submit_dag).
You're probably better off doing condor_hold/condor_release on the DAGMan job instead of condor_suspend/condor_continue.
Note that if you do condor_hold/condor_release on a DAGMan, it *will* go into recovery mode, but that's the correct behavior.
Kent Wenger CHTC Team