Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor and Docker

Date: Fri, 10 Apr 2015 18:41:20 +0100
From: Brian Candler <b.candler@xxxxxxxxx>
Subject: Re: [HTCondor-users] HTCondor and Docker

On 07/04/2015 16:21, Greg Thain wrote:

On 04/07/2015 10:02 AM, Brian Candler wrote:
There are three different things I'm thinking of.

(1) Running a HTCondor worker node as a Docker container.
This should be straightforward. All the jobs would run within thesame container and therefore have an enforced limit on total resourceusage.
This would be a quick way to add HTCondor execution capability to anexisting Docker-aware server, just by
"docker run -d htcondor-worker"
or somesuch.
We've looked at this, and it is a bit more work than you might think,for the htcondor-worker would need to be configured to point thecentral manager, and be compatible with the rest of the pool.Generally, docker containers run within NATs, and worker nodes needinbound connections, so CCB needs to be set up on the central manageras well. You might want to volume mount the execute directory,otherwise, docker has a 10gb limit on container growth out of the box,though that limit can be increased.
Also, depending on your security posture, you probably don't want torun the worker node as root within the container, which may or may notbe a problem for your HTCondor usage.

Well, on a normal system condor_master is run as root and drops to thesubmit user. Under docker, it would probably make more sense to run alljobs as a condor user, which I know condor can be configured to do.

Re configuration: I guess this could be provided at container starttime, but in practice I'd be quite happy to build my own Dockerfilewhich layers on top of a base htcondor container. That is, thedockerfile would add a customised condor_config[.local]


Re networking: I hadn't considered that, but CCB looks like a good solution.

(3) Docker containers on the submit host
A docker container would be a convenient abstraction to use on thesubmission host. Normally when you start a HTCondor DAG you need tocreate an empty working directory, run a script to create the DAGand/or SUB files, run condor_submit_dag, monitor progress to wait forcompletion, check the exit status to see if all DAG nodes completedsuccessfully, fix/restart if necessary, then tidy up the work directory.
Docker on the submission host could handle this lifecycle: thecontainer would be the work directory, it would run the scripts youwant, submit the DAG and be visible as a running container until ithas completed, and the container itself has an exit status whichwould show whether the DAG completed succesfully or not, under"docker ps".
https://docs.docker.com/reference/commandline/cli/#filtering_2
When you are finished with the results then you would destroy thecontainer.
This one might be a bit tricky to implement, as I don't see any wayto have condor_submit_dag or condor_submit run in the foreground. Ithink it would be necessary to run "condor_dagman -f" directly as theprocess within the container.
The container also needs to communicate with the condor schedd, andI'm not sure if it needs access to bits of the filesystem as well(e.g. condor_config). If necessary, /etc/condor/ can beloopback-mounted as a volume within the container.
This is a use case we haven't considered, but dagman really works bestnow when it is a job managed by the schedd.

I understand that's how dagman is designed to run, in the scheduleruniverse. This means the user needs to poll either condor_q, or thedagman log or the jobstate.log or node.status file, to work out when thejob finished and if it was successful - or add a FINAL node.

Anyway, I only mention this use case because I have had to startwrapping condor for use in automated batch jobs triggered by othersystems. This includes:

1. creating a working directory (I'm using/var/spool/htcondor/current/<uuid>)

2. running a script to create the DAG, using parameters from the request
3. submitting the DAG
4. polling the status

5. sending a response when the DAG completes successfully or fails(right now I'm adding an empty FINAL node with a POST script for this)

6. resubmitting the DAG if a retry is required
7. removing the working directory when it is no longer needed

- and it's just starting to look very much like a Docker containerlifecycle!


1-3 = docker run
4-5 = docker ps
6 = docker start
7 = docker rm

Hence a docker_scheduler universe would be attractive to me.

Regards,

Brian.

Follow-Ups:
- Re: [HTCondor-users] HTCondor and Docker
  - From: Greg Thain

References:
- [HTCondor-users] HTCondor and Docker
  - From: Brian Candler
- Re: [HTCondor-users] HTCondor and Docker
  - From: Greg Thain

Prev by Date: Re: [HTCondor-users] python-bindings and flask
Next by Date: Re: [HTCondor-users] HTCondor and Docker
Previous by thread: Re: [HTCondor-users] HTCondor and Docker
Next by thread: Re: [HTCondor-users] HTCondor and Docker
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] HTCondor and Docker