Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Cannot use singularity cache directory
- Date: Fri, 7 Apr 2023 13:20:49 -0500
- From: Greg Thain <gthain@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Cannot use singularity cache directory
On 4/7/23 04:16, Florent CouziniÃ-Devy wrote:
I set up a local singularity registry from our small cluster and
wanted to profit from the caching
of images included in singularity. We often use images bigger than 1GB
and transferring the image to
the executor for each job seems wasteful (and slow in our case).
Apptainer/singularity has a nice
feature where if you load the images from a registry, it checks the
hash to see if the image exists
in the cache folder and only download it if not already available
locally.
The documentation [1] suggests that the singularity cache directory is
configurable with the sentence
"There you will find parameters to customize things such as [...],
cache directory," but it actually
is not. The environment variable used to configure it
"APPTAINER_CACHEDIR" is overwritten by
htcondor and set to the "execute_dir", which is temporary and not
adapted for caching. I checked the
code and this is done in
`condor_starter-V6.1/singularity.cpp:l403-411`. The comments suggest this
config is done for image rebuilt from docker images (which might not
be cached, I don't have much
experience with the "docker://" handle). It would be nice it the
APPTAINER_CACHEDIR was only set to
"execute_dir" when it is not already defined by the user or the condor
configuration.
Hi Florent:
I'm sorry you are having problems with this. A couple of complications
we should keep in mind -- one of the reasons HTCondor sets
APPTAINER_CACHEDIR to the execute directory is that we can guarantee we
cleanup and remove the cached files. Maybe sooner than you'd like in
this case, but otherwise we'd keep them around forever, and fill up the
disk. Another complication is that, by default, apptainer puts the
cache under the home directory. But in some HTCondor setups, we run
with "slot users", and several different submitting users may share the
same Unix uid and home directory on the worker node. In many cases,
admins setup the home directories to not be writeable by the slot user.
For now, I am thinking of "solving" this problem by using a DAG job,
with a first job that pull the image only if
necessary and the second that uses the local image but it complexifies
the workflow unnecessarily.
Does anyone see a better solution to reuse a singularity image by
storing it on the local storage of an executor?
When would you want the local copy to be removed? Do you have a small,
fixed number of images you are interested in running?
greg