Iâm seeing the same thing and wondering if you every solved this. Iâm wondering if it has do to with the account that condor_starter is running under versus
using root to run docker with this in the local config file:
DOCKER = sudo /usr/bin/docker
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx]
On Behalf Of Matthias Schnepf
Sent: Friday, December 11, 2015 2:48 AM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] Universe Docker: Cannot start container
Hello everybody
I want to start a job in a docker universe
###### submit file ######
universe = docker
docker_image = debian
executable = /bin/cat
arguments = /etc/hosts
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
output = out.$(Process)
error = err.$(Process)
log = log.$(Process)
request_memory = 100M
requirements = (Machine == "dockermachine")
queue 1
################
The log.0 file says:
000 (033.000.000) 12/11 11:13:18 Job submitted from host: <192.168.0.1:9618?addrs=192.168.0.1-9618&noUDP&sock=83868_5d5e_5>
...
001 (033.000.000) 12/11 11:13:27 Job executing on host: <192.168.0.2:9615?CCBID=192.168.0.1:9618%3faddrs%3d192.168.0.1-9618%26noUDP%26sock%3dcollector#1&addrs=192.168.0.2-9615&noUDP&sock=1933991_db28_5>
...
007 (033.000.000) 12/11 11:13:49 Shadow exception!
Error from slot1@dockermachine: Cannot start container: invalid image name: debian
0 - Run Bytes Sent By Job
0 - Run Bytes Received By Job
...
012 (033.000.000) 12/11 11:13:49 Job was held.
Error from slot1@dockermachine: Cannot start container: invalid image name: debian
Code 35 Subcode 0
And the logfile from the host says:
##### /var/log/condor/StarterLog.slot1 #####
....
2/11/15 11:13:30 (pid:2173165) Starting a VANILLA universe job with ID: 33.0
12/11/15 11:13:30 (pid:2173165) Output file: /var/lib/condor/execute/dir_2173165/_condor_stdout
12/11/15 11:13:30 (pid:2173165) Error file: /var/lib/condor/execute/dir_2173165/_condor_stderr
12/11/15 11:13:30 (pid:2173165) lock_file returning ERROR, errno=9 (Bad file descriptor)
12/11/15 11:13:30 (pid:2173165) FileLock::obtain(1) failed - errno 9 (Bad file descriptor)
12/11/15 11:13:30 (pid:2173165) Found 2 entries in docker image cache.
12/11/15 11:13:30 (pid:2173165) lock_file returning ERROR, errno=9 (Bad file descriptor)
12/11/15 11:13:30 (pid:2173165) FileLock::obtain(2) failed - errno 9 (Bad file descriptor)
12/11/15 11:13:30 (pid:2173165) Process exited, pid=2173169, status=1
12/11/15 11:13:30 (pid:2173165) DockerProc::JobReaper()
12/11/15 11:13:30 (pid:2173165) Failed to create classad from Docker output (0). Printing up to the first 9 (nonblank) lines.
12/11/15 11:13:30 (pid:2173165) Error: No such image or container: HTCJob33_0_slot1_PID2173165
12/11/15 11:13:31 (pid:2173165) Failed to create classad from Docker output (0). Printing up to the first 9 (nonblank) lines.
......
#####
The Docker image debian is pulled on the host system. And the folder /var/lib/condor is empty and has the owner condor.
Has someone an idea to fix this problem?
Best regards
Matthias