Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] condor and docker - advise for a newbie
- Date: Mon, 11 Dec 2017 14:17:39 -0700
- From: Ian McEwen <mian@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] condor and docker - advise for a newbie
On Mon, Dec 11, 2017 at 12:40:56PM -0500, Larry Martell wrote:
> Just getting stared with condor and I am looking for some guidance.
>
> Currently I have 2 docker containers that are linked to each other.
> One has a crontab that runs many jobs throughout the day. Many of
> these jobs are multithreaded and/or fork off other processes. The jobs
> require services from both its own and the other container. My goal is
> to use HTCondor to distribute these jobs, threads, and forked
> processes across multiple machines. From reading the docs I think I
> need the docker universe for this. Is that correct? But how can I have
> condor start up both containers? It is possible to already have the
> containers running on the remote hosts and have condor invoke the jobs
> inside them?
>
Hello!
I believe that the docker universe is probably unsuitable for this use
case, but it should be possible to do what you want by way of vanilla
universe jobs -- with the caveat that HTCondor's resource tracking will
probably not work as you expect. It may also be possible to run HTCondor
startds within your existing containers as a way of scheduling jobs to
them.
First, re: the docker universe. By design, it does not expose every
potential feature of Docker; it's designed to be a way of specifying an
environment to run a job in, and a way to isolate that job from the
surrounding host, and not really more. Notably for your use case, it
does not (as far as I'm aware) support docker's links or networking
features, nor would it allow running jobs inside an already-running
container. Basically, it's a good way to specify that you want the job
to run on Debian with X, Y, and Z packages installed, but not to specify
connected network resources, other processes, etc.
On to the parts which might help solve your case:
* Use the vanilla universe, but sacrifice HTCondor's resource tracking:
You can run a vanilla universe job and write a script that calls out
to 'docker run', 'docker exec', etc., so long as the user the job will
run as is allowed to run docker. If you wanted to have the job start
up the prerequisite containers, it could do so in the script, or you
could set up your nodes to have the containers already running and
then use 'docker exec' to run things within the containers. However,
only the actual 'docker run' or 'docker exec' process (and thus not
the containers themselves or the processes being run within them) will
fall within HTCondor's jurisdiction, due to how Docker works. There's
some funny potential ways to change this which probably aren't that
advisable unless you're really attached to having HTCondor's resource
tracking work as expected. (Specifically, if anyone needs to go down
this road: with 'docker run' you can pass a cgroup parent, so with
HTCondor cgroup-based tracking you can determine the parent script's
cgroup (the htcondor-created one) and pass it as the parent to the
docker container. However, you need to also pass down the resource
constraints, probably slightly smaller than the slot -- if not, the
wrapper script will get killed off but the container will persist,
from the testing of this approach I've done)
* Run a startd inside the container:
Instead of using a script from outside the container to run things
within the container, you could instead run HTCondor itself inside a
container where the environment you want is available, and have your
jobs be routed there. To do so, you'd need to construct an appropriate
configuration file -- most likely, you would turn on the shared port
daemon, expose its port to the outside world when running the docker
container, and use TCP_FORWARDING_HOST to specify the surrounding
host's IP as the appropriate place to connect to. If you're running
more than these jobs in your HTCondor cluster, you'll probably want to
add a STARTD_ATTR
(http://research.cs.wisc.edu/htcondor/manual/current/3_5Configuration_Macros.html#22879)
which identifies these special slots as inside the docker container,
and add that as a requirement on your job, and set up the START
expression of these slots to refuse jobs which don't explicitly
request them.
Hopefully what I'm saying makes sense. The first option is most likely
easier to implement, and the second is arguably cleaner but more finicky
to set up.
> Thanks!
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/