Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] How to stop condor from removing working directory (fwd)
- Date: Wed, 24 Dec 2008 00:33:39 -0500
- From: Ian Stokes-Rees <ijstokes@xxxxxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] How to stop condor from removing working directory (fwd)
Hi Steve,
Steven Timm wrote:
There are several "working directories" that you could be
referring to, are you talking about on the remote (server) side
or the client side, and on the server side are you talking about
the directory in which the job runs, or the globus temporary directory.
Yes. In fact, I mean all of them on the server side. I wasn't aware of
any temporary directories created per-job on the client side when using
Condor-G (via OSG and "grid" universe jobs). Are there any?
There are four on the server side that I know of:
Globus GASS cache dir: ~/.globus/.gass_cache/md5/??/hash1/md5/??/hash2
(contains hard-linked executable, always (annoyingly) named "data" with OSG)
Logs: ~/.globus/job/FQDN/####.######### (contains stdout/err, local
classad, proxy cert, io URL)
IWD: ~/gram_scratch_randomstring (contains the actual working directory
of the job -- not sure what happens with systems using local directories
rather than shared directories for jobs)
GRAM log: $V_L/globus/tmp/gram_job_state (contains files named
gram_condor_log.####.######## which match the logs number above -- in
fact, I've just noticed these don't seem to be cleaned up, and I have
2300 log and lock files from as far back as July)
There is a configuration option for the globus
gatekeeper/globus-job-manager
to not delete the temporary files on a successful job,
or ones it thinks are successful such as "globus error 155".
This is exactly the error I am getting (globus error 155: cannot
transfer output files). Can you shed any light on it? We have NAT, but
no firewall, and the problem is intermittent.
The file is $VDT_LOCATION/globus/etc/globus-job-manager.conf
and the option is -save-logfile. default is on_error,
I believe the other option is "always" to save everything
but you have to be careful because stuff will fill up fast.
Thanks.
Ian
--
Ian Stokes-Rees, Research Associate
SBGrid, Harvard Medical School
http://sbgrid.org