[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_dagman crashed when suspend it in 7.8.2 version



On Thu, 5 Sep 2013, 钱晓明 wrote:

I found that to condor_suspend a dagman job can make it crashed and get into
RECOVERY mode. This is the output for dagman when issue suspend command:
It doesn't surprise me that that happens.  I did some testing with 8.0, 
and I'm not seeing the exact same behavior.  But I'm not sure what 
condor_suspend is supposed to do to a scheduler universe job (which DAGMan 
is, unless you've changed the normal .condor.sub file generated by 
condor_submit_dag).
You're probably better off doing condor_hold/condor_release on the DAGMan 
job instead of condor_suspend/condor_continue.
Note that if you do condor_hold/condor_release on a DAGMan, it *will* go 
into recovery mode, but that's the correct behavior.
Kent Wenger
CHTC Team