Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] condor and docker - advise for a newbie
- Date: Mon, 11 Dec 2017 18:44:31 -0500
- From: Larry Martell <larry.martell@xxxxxxxxx>
- Subject: Re: [HTCondor-users] condor and docker - advise for a newbie
On Mon, Dec 11, 2017 at 4:17 PM, Ian McEwen <mian@xxxxxxxxxxx> wrote:
> On Mon, Dec 11, 2017 at 12:40:56PM -0500, Larry Martell wrote:
>> Just getting stared with condor and I am looking for some guidance.
>>
>> Currently I have 2 docker containers that are linked to each other.
>> One has a crontab that runs many jobs throughout the day. Many of
>> these jobs are multithreaded and/or fork off other processes. The jobs
>> require services from both its own and the other container. My goal is
>> to use HTCondor to distribute these jobs, threads, and forked
>> processes across multiple machines. From reading the docs I think I
>> need the docker universe for this. Is that correct? But how can I have
>> condor start up both containers? It is possible to already have the
>> containers running on the remote hosts and have condor invoke the jobs
>> inside them?
>>
>
> Hello!
>
> I believe that the docker universe is probably unsuitable for this use
> case, but it should be possible to do what you want by way of vanilla
> universe jobs -- with the caveat that HTCondor's resource tracking will
> probably not work as you expect. It may also be possible to run HTCondor
> startds within your existing containers as a way of scheduling jobs to
> them.
>
> First, re: the docker universe. By design, it does not expose every
> potential feature of Docker; it's designed to be a way of specifying an
> environment to run a job in, and a way to isolate that job from the
> surrounding host, and not really more. Notably for your use case, it
> does not (as far as I'm aware) support docker's links or networking
> features, nor would it allow running jobs inside an already-running
> container. Basically, it's a good way to specify that you want the job
> to run on Debian with X, Y, and Z packages installed, but not to specify
> connected network resources, other processes, etc.
>
> On to the parts which might help solve your case:
>
> * Use the vanilla universe, but sacrifice HTCondor's resource tracking:
> You can run a vanilla universe job and write a script that calls out
> to 'docker run', 'docker exec', etc., so long as the user the job will
> run as is allowed to run docker. If you wanted to have the job start
> up the prerequisite containers, it could do so in the script, or you
> could set up your nodes to have the containers already running and
> then use 'docker exec' to run things within the containers. However,
> only the actual 'docker run' or 'docker exec' process (and thus not
> the containers themselves or the processes being run within them) will
> fall within HTCondor's jurisdiction, due to how Docker works. There's
> some funny potential ways to change this which probably aren't that
> advisable unless you're really attached to having HTCondor's resource
> tracking work as expected. (Specifically, if anyone needs to go down
> this road: with 'docker run' you can pass a cgroup parent, so with
> HTCondor cgroup-based tracking you can determine the parent script's
> cgroup (the htcondor-created one) and pass it as the parent to the
> docker container. However, you need to also pass down the resource
> constraints, probably slightly smaller than the slot -- if not, the
> wrapper script will get killed off but the container will persist,
> from the testing of this approach I've done)
> * Run a startd inside the container:
> Instead of using a script from outside the container to run things
> within the container, you could instead run HTCondor itself inside a
> container where the environment you want is available, and have your
> jobs be routed there. To do so, you'd need to construct an appropriate
> configuration file -- most likely, you would turn on the shared port
> daemon, expose its port to the outside world when running the docker
> container, and use TCP_FORWARDING_HOST to specify the surrounding
> host's IP as the appropriate place to connect to. If you're running
> more than these jobs in your HTCondor cluster, you'll probably want to
> add a STARTD_ATTR
> (http://research.cs.wisc.edu/htcondor/manual/current/3_5Configuration_Macros.html#22879)
> which identifies these special slots as inside the docker container,
> and add that as a requirement on your job, and set up the START
> expression of these slots to refuse jobs which don't explicitly
> request them.
>
> Hopefully what I'm saying makes sense. The first option is most likely
> easier to implement, and the second is arguably cleaner but more finicky
> to set up.
Thanks so much Ian for the very detailed reply. My central manager
machine has 24 processors, but the 2 machines I want to distribute
jobs across have 176 each. I want to take advantage of all this CPU
power and run as many threads and forked processors as possible. Given
that, what configuration would you recommend?