[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Tracking DAGMan jobs



> Newsgroups: gmane.comp.distributed.condor.user
> From: Brian Candler <b.candler-e+AXbWqSrlAAvxtiuMwx3w@xxxxxxxxxxxxxxxx>
> Subject: Re: Tracking DAGMan jobs
> Date: Mon, 30 Dec 2013 14:20:01 +0000
> 
> > Wrap it in a nested dag and this should be pretty easy: The toplevel DAG will handle all the messy details.
> >
> > subdag external mydag the-original-dag.dag
> > script post mydag post.script
> > script pre mydag pre.script
> >
> > Nathan Panike
> 
> I was going to wrap it that way anyway as a clean way to add a FINAL node, but the issue of finding the clusterID still remains.
> 
> According to the documentation (and confirmed by experiment), $JOBID can be used as an argument to a POST script, but not to a PRE script.
> 
> Let me try to be a bit clearer about my requirements.
> 
> * I want to insert a record in a database at DAG start, and update it at DAG end

Write a wrapper for condor_submit_dag, e.g., use the following pseudocode:

condor_submit_dag -no_submit <dagfile> # Generates the .condor.sub file
condor_submit -append 'hold=true' <.condor.sub> # Submits the dag on hold, gets an ID from the schedd
<You push the ID into your database>
condor_release <ID>

Then use the do_final.sh method you are doing below. (No need to wrap in a subdag now)


> * I want to include the clusterID of the dagman process in the database row, so that for example someone can manually "condor_rm" it or otherwise examine its status.
> * I would prefer to use the clusterID as the key when updating the row, to avoid having to allocate some additional unique ID and pass it to the SCRIPT POST.
> 
> So if I insert the database row as part of SCRIPT PRE, it still needs some way to find its own clusterID.
> 
> Now, testing with
> 
> $ cat testwrap.dag
> SUBDAG EXTERNAL mydag test.dag
> SCRIPT PRE mydag do_final.sh 0 $JOB
> SCRIPT POST mydag do_final.sh $RETURN $JOB $JOBID
> 
> it looks like the SCRIPT PRE/POST do both have CONDOR_ID in the environment. So I guess I can use that (undocumented) feature.
> 
> One slightly messy thing I noticed about wrapping the DAG as a SUBDAG EXTERNAL is that if it fails, we get two rescue DAGs: one for the subdag and one for the outer dag.
> SPLICE doesn't have this issue, but you can't use PRE/POST with a SPLICE. However you can with a FINAL node:
> 
> SPLICE mydag test.dag
> FINAL final_node /dev/null NOOP
> SCRIPT PRE final_node do_final.sh $DAG_STATUS $FAILED_COUNT
> 
> Regards,
> 
> Brian.