Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] HTCondor and Docker
- Date: Tue, 07 Apr 2015 16:02:12 +0100
- From: Brian Candler <b.candler@xxxxxxxxx>
- Subject: [HTCondor-users] HTCondor and Docker
I'd like to know what's the current state of HTCondor with Docker. I see
some notes at the end of
https://indico.cern.ch/event/320819/session/3/contribution/56/material/slides/0.pptx
but these may just be "wish list" as far as I can tell.
There are three different things I'm thinking of.
(1) Running a HTCondor worker node as a Docker container.
This should be straightforward. All the jobs would run within the same
container and therefore have an enforced limit on total resource usage.
This would be a quick way to add HTCondor execution capability to an
existing Docker-aware server, just by
"docker run -d htcondor-worker"
or somesuch.
(2) A "docker universe" where each job instance launches within a new
Docker container, from a chosen container template.
When the job starts, a container is created, and when the job terminates
the container is destroyed (except perhaps on failure, in which case we
can keep it around for post-mortem?)
condor_exec would need to fire off "docker run" (preferably via the
docker API) and track it until the container terminated. Plumbing for
stdin/stdout and file transfer would also be required. Hence maybe part
of condor_exec itself should run within the container?
Note: in principle it should be possible to combine (1) and (2)
https://blog.docker.com/2013/09/docker-can-now-run-within-docker/
(3) Docker containers on the submit host
A docker container would be a convenient abstraction to use on the
submission host. Normally when you start a HTCondor DAG you need to
create an empty working directory, run a script to create the DAG and/or
SUB files, run condor_submit_dag, monitor progress to wait for
completion, check the exit status to see if all DAG nodes completed
successfully, fix/restart if necessary, then tidy up the work directory.
Docker on the submission host could handle this lifecycle: the container
would be the work directory, it would run the scripts you want, submit
the DAG and be visible as a running container until it has completed,
and the container itself has an exit status which would show whether the
DAG completed succesfully or not, under "docker ps".
https://docs.docker.com/reference/commandline/cli/#filtering_2
When you are finished with the results then you would destroy the container.
This one might be a bit tricky to implement, as I don't see any way to
have condor_submit_dag or condor_submit run in the foreground. I think
it would be necessary to run "condor_dagman -f" directly as the process
within the container.
The container also needs to communicate with the condor schedd, and I'm
not sure if it needs access to bits of the filesystem as well (e.g.
condor_config). If necessary, /etc/condor/ can be loopback-mounted as a
volume within the container.
The user-provided scripts need to be available (e.g. by including them
inside a custom docker image from "docker build") and they need to have
parameters passed to them - this could be via environment variables
(docker run -e). If the container is restarted, it should re-submit the
existing DAG, not run the scripts to create a new DAG.
If anyone has done any of the above, I'd be very interested to hear
about your experiences.
Regards,
Brian.