Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] "incremental" (singularity) jobs
- Date: Sun, 19 Aug 2018 10:04:19 -0700
- From: Philippe Grassia <pgrassia@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] "incremental" (singularity) jobs
Hi
I do not have an all encompassing solution but I have given this type of
problem some thoughts already
the whole point of a container solution (including but not limited to)
is to isolate the processes in the container from the the rest of the
world (host / other containers) so from the container host (in this
case condor execute node) this is an inter process communication
problem. the options then are
- pipe from data management to singularity
submit file could look like
executable = sh
arguments = -c "process_on_the_execute_node fetch data | singularity
exec $MY_SINGULARITY_EXEC_OPTIONS my_script_inside_the_container |
post_processing_on the execute_node"
- use a socket or a fifo declared on the host bound the singularity
image : data management does its thing writes to the socket or fifo ,
the processes inside the container just read from there oblivious to the
fact that it is also handled from outside the container
since data_management and the container processes run in parallel this
could probably be a (dynamic ?) DAG
- bind a common directory from the host in the container and read and
write files (this will lead to concurrency concerns between
data_management and container)
- shared memory. I believe it is possible but I think the config on the
host would be way too contrapted to be useful at the scale of a condor
cluster (using /dev/shm is actually previous scenario from a functional
standpoint)
That being said do not forget that it is possible to subclass
singularity images for your own benefit, use recipes.
http://singularity.lbl.gov/docs-recipes
https://www.sylabs.io/guides/2.5.1/user-guide/container_recipes.html
if your data management client is not too convoluted that's the route I
would personally investigate with a series of recipes looking like
Bootstrap: shub
From: my_3rd_party_image
%help
adding "data management" to my-3rd_party_image
# Both of the below are copied before %post
# 1. This is how to copy files for legacy < 2.3
%setup
ÂÂÂ add_my_data_management_repository
ÂÂÂ apt update
ÂÂÂ apt install my_data_management
%files
ÂÂÂ copy_configuration_to_make_my_data_management_client_useful
depending on the amount of 3rd party images and the complexity of the
installation of the data_management client, this may or may not be a
viable option.
HTH
Philippe
On 08/19/2018 12:18 AM, Michael Hanke wrote:
Hi,
I cannot find a straightforward solution for the following problem, and
I would be glad if someone could put me on the right track on how to do
it, or how to reframe the problem.
We have jobs to process that cover a wide range of data processing. They
all have in common that specific code/applications come in singularity
images that are provided by 3rd-parties. To perform the computations,
data need to be pulled from a data management system at the beginning
and results need to be put back into it at the end. The execute nodes do
not have the required data management software, though. Given that the
core processing is done via singularity, it would be easy to provide the
data management software via such an image as well. However, it would be
very difficult to fold it into all the individual singularity images
provided by 3rd-parties.
Q: Is it possible to bind three singularity jobs (each with its own
singularity image) together, such that they run on any, but the exact
same machine, and that they all share a common temporary work dir (the
execute nodes have no shared filesystem). The shared work dir is
important, as the size of the dataset is substantial (>x*100GB) and
moving the job results between prep, computation and finalize stages
would lead to substantial stress on the network, while the final results
tend to be rather small.
I'd be happy for any suggestions. Thanks!
Michael
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/