In the documentation at http://research.cs.wisc.edu/htcondor/manual/v8.0/2_10DAGMan_Applications.html#sec:DAGFinalNode it defines $DAG_STATUS and means for values 0 to 6. Question: is this value available anywhere else, e.g. in the dagman log or jobstate_log? If so, I cannot find them. Does this mean I have to add a dummy FINAL node to a job, purely to capture the $DAG_STATUS? I have done a few experiments. * The exit code of condor_dagman for any DAG failure appears to be always 1, unless you use ABORT-DAG-ON in which case it's the exit status of the failed node, or the replacement value from the RETURN clause. * Here is a simple example where the $DAG_STATUS is 2, but I cannot find this value of 2 anywhere in the log files. ==> dag1.dag <== JOB J1 nonexistent.sub JOBSTATE_LOG jobstate1.log $ condor_submit_dag -f dag1.dag $ condor_wait dag1.dag.dagman.log $ cat jobstate1.log 1401275749 INTERNAL *** DAGMAN_STARTED 3655962.0 *** 1401275762 J1 SUBMIT_FAILURE - - - 1 1401275767 J1 SUBMIT_FAILURE - - - 1 1401275772 J1 SUBMIT_FAILURE - - - 1 1401275777 J1 SUBMIT_FAILURE - - - 1 1401275788 J1 SUBMIT_FAILURE - - - 1 1401275805 J1 SUBMIT_FAILURE - - - 1 1401275805 INTERNAL *** DAGMAN_FINISHED 1 *** $ cat dag1.dag.dagman.out ... can't find anything relevant ... Adding a FINAL node shows that $DAG_STATUS is indeed 2: ==> dag2.dag <== JOB J1 nonexistent.sub JOBSTATE_LOG jobstate2.log FINAL final /dev/null NOOP SCRIPT POST final final.sh $DAG_STATUS $FAILED_COUNT ==> final.sh <== #!/bin/sh echo "exiting with code $1" >/tmp/final.log exit $1 $ condor_submit_dag dag2.dag $ condor_wait dag2.dag.dagman.log $ cat /tmp/final.log exiting with code 2 $ cat jobstate2.log 1401275842 INTERNAL *** DAGMAN_STARTED 3655965.0 *** 1401275855 J1 SUBMIT_FAILURE - - - 1 1401275860 J1 SUBMIT_FAILURE - - - 1 1401275865 J1 SUBMIT_FAILURE - - - 1 1401275870 J1 SUBMIT_FAILURE - - - 1 1401275881 J1 SUBMIT_FAILURE - - - 1 1401275898 J1 SUBMIT_FAILURE - - - 1 1401275903 final SUBMIT 0.2147483647 - - 2 1401275903 final JOB_TERMINATED 0.2147483647 - - 2 1401275903 final JOB_SUCCESS 0 - - 2 1401275903 final POST_SCRIPT_STARTED 0.2147483647 - - 2 1401275903 final POST_SCRIPT_TERMINATED 0.2147483647 - - 2 1401275903 final POST_SCRIPT_FAILURE 0.2147483647 - - 2 1401275908 INTERNAL *** DAGMAN_FINISHED 1 *** The value '2' here is I believe the sequence number, not the $DAG_STATUS or the exit code from the POST script. So I'd like to retrieve the $DAG_STATUS value to report on why a DAG failed, but is there any way to get it other than in a FINAL script? Thanks, Brian. |