Hi, I have successfully managed to get a Docker container running the image htcondor/mini to run Docker Universe jobs: Dockerfile: FROM htcondor/mini:10.0.2-el8 # Install the docker client RUN yum install -y wget nano RUN wget https://download.docker.com/linux/centos/8/x86_64/stable/Packages/docker-ce-cli-19.03.13-3.el8.x86_64.rpm RUN yum install -y docker-ce-cli-19.03.13-3.el8.x86_64.rpm […] # Give to user condor the necessary rights to use docker RUN groupadd docker || : RUN usermod -aG docker condor # Start HTCondor CMD ["condor_master", "-f"] Command: docker run -it --network host --name condor --rm \ -v /var/run/docker.sock:/var/run/docker.sock \ my_mini_condor:version I wanted to go a step further and create a full cluster using the htcondor/cm, htcondor/submit and htcondor/execute images. Dockerfile (for execute nodes): FROM htcondor/execute:10.0.2-el8 […] # Give to user condor the necessary rights to use docker RUN groupadd docker || : RUN usermod -aG docker condor # Start HTCondor CMD ["/bin/bash", "-x", "/start.sh"] Command: docker run -it --name condor-node2 --rm --network=condor-network \ -v /var/run/docker.sock:/var/run/docker.sock \ -e CONDOR_HOST=condor.test.cm \ -h condor.test.node2 \ -e USE_POOL_PASSWORD=YES \ -v /home/gage/tests/condor/password:/etc/condor/passwords-orig.d \ deregistry.terma.com/termade/mmepi/mmepi-sw/mmepi_condor_node:3.0.99 My little cluster works and I can run Vanilla Universe jobs just fine, but the Docker jobs stay idle forever. I can run docker commands from within the execute containers just fine, and I don’t understand why condor does not pick up docker this time. condor_status -l | grep -i docker returns nothing, and running things like
condor_restart, condor_master -f or
condor_reconfig do not resolve the issue. Here is what condor_q -better-analyze returns: Thanks, Gaëtan
Attention: |