[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Halting a Dagman



Thanks Todd and Cole,

Indeed I had a "memory" of having read about <DAG file>.halt
but could not find it in the doc today, so I asked.
I have to find a way to create that file remotely, since action will
be initiated on a machine other than the AP. I guess I can't
"spool" files there anymore.

Our use case is the usual "very old thing which was never touched
since Brian wrote it 10+ years ago maybe because at the time it
was the only/best way and we haven't tried to improve".

We submit remotely a condor job on the scheduler universe in the AP,
this job does some needed intializations (like unpacking of tarball, preparing
directories, reporting things back to submitter and do some env. configuration)
and then unleashes condor_dagman as last line. So eventually
Dagman does run on the AP !

I do not know if we could use condor_submit_dagman instead at that point,
can a job submit another job ? The fact that the initial job does not exit,
but keeps running executing condor_dagman does help bookeeping and
monitoring, putting hands in that makes changes grow too much and too fast
for the current "it works, don't fix it" situation.

Thanks

Stefano