Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Defining an exit script for condor jobs
- Date: Thu, 6 Oct 2005 15:51:40 -0500
- From: Jaime Frey <jfrey@xxxxxxxxxxx>
- Subject: Re: [Condor-users] Defining an exit script for condor jobs
On Oct 6, 2005, at 2:13 PM, Terrence Martin wrote:
I asked this question a couple months ago but I wanted to put it out
again because I did not follow up on the one response I got.
My question was whether it is possible to have a script run on job
exit
that can go beyond what the normal condor exit does in terms of
cleaning
up areas. This is important in the current Open Science Grid
clusters I
am working with since often user files are stored in temporary area
that
condor does not necessarily know about. It would be nice to have this
area cleared on exit.
The answer I got was either use a wrapper or Dagman.
The first solution does not work, that is if I follow the rules for
USER_JOB_WRAPPER in the condor documentation to not have the wrapper
fork a child and only call exec. I can do that but it is not clear I
should. What would be nice is that in addition to USER_JOB_WRAPPER
there
was a USER_JOB_EXIT_SCRIPT which could define a script that performs
certain cleanup steps on job exit.
As far as DAGman, I am not sure how that would help. DAGman from the
condor documentation is meta-scheduler that submits to condor. That
sounds like it works on the outside between the user and condor. The
grid software I work with is already thick with schedulers to
condor and
I cannot enforce what users make use of on that side. All I can
control
is my condor queue and my worker nodes. Admittedly my knowledge of
dagman extends to what I read here http://www.cs.wisc.edu/condor/
dagman/
but it does not sound like what I am looking for.
I guess I have another option and try to be clever. Just before my
user
wrapper drops to the actual job I could start a monitoring process
that
watches for the job to exit and then try to cleanup. It would be
simpler
and probably less error prone if condor could just trigger a cleanup
process though. This would also have to end up being an orphan
process
since the parent calls an exec right after it spawns the monitor.
I see a few options available, none ideal:
1) Have the USER_JOB_WRAPPER clean up the files of the previous job.
2) If you have any control on the submit side, you can set a post-
script in the job ad that will be run after the job. You can use
SUBMIT_EXPRS to add it automatically to all jobs.
3) Have the USER_JOB_WRAPPER fork the job instead of exec'ing it. We
don't mention this in the manual because it can be tricky to get
right. The script has to not exit before the job, exit with the same
status as the job, and catch SIGTERM and forward it to the job. If
you run any standard universe jobs, there are several more signals
the script has to catch and forward to the job. There may be some
other details, but but those are the ones I can think of.
+----------------------------------+---------------------------------+
| Jaime Frey | Public Split on Whether |
| jfrey@xxxxxxxxxxx | Bush Is a Divider |
| http://www.cs.wisc.edu/~jfrey/ | -- CNN Scrolling Banner |
+----------------------------------+---------------------------------+