Brian Candler wrote: > On 03/04/2017 19:02, Dimitri Maziuk wrote:
>> I wonder: in what scenario a post script that starts with >> >> #!/bin/sh >> if [ $1 -ne 0 ] ; then exit $1 ; fi >> >> would cause problems? > Only when you forget to do it. > We recently had a problem when a broken dataset ended up getting deployed. It was controlled by a top-level dag with subdags. After grubbing through various condor log files, it turns out it was due to one of the inner dags failing, but the top level DAG had POST scripts to notify progress, and they weren't handling $RETURN properly. > So I was just wondering if it was possible to idiot-proof this. I'm liking the idea of dealing with this with one line in your DAG file, for example:
RUN_POST_ON_JOB_FAIL ALL_NODES false
(On the other hand, doing it in configuration rather than with a DAG command would make it easier to do across splices and sub-DAGs, but you'd have no way to do it on a per-node basis then.)
Kent
|