Hi Todd,
Currently, we have different kind of jobs where the processes
run successfully but the docker container around is still there:
ÂÂ 0.00
B/s 18710 condorÂÂÂÂ 20ÂÂ 0 68168Â 6784Â 5420 SÂ 0.0Â 0.0Â
0:05.06 â â ââ condor_starter -f -a slot1_7 schedd1
ÂÂ 0.00 B/s 18714 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:01.28 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_PID1
ÂÂ 0.00 B/s 19037 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.04 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 19036 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.03 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18995 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.04 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18994 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.03 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18982 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.04 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18981 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.03 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18953 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.03 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18952 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.02 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18941 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.04 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18940 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.04 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18759 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.04 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18758 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.04 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18757 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.03 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18727 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.04 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18726 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.04 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18725 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.04 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18724 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.04 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18723 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.04 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18722 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.04 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18721 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.03 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18720 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.03 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18719 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.04 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18718 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.05 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18717 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.03 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18716 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.00 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18715 condorÂÂÂÂ 20ÂÂ 0Â 857MÂ 9588Â 6052
S 0.0 0.0 0:00.03 â â â ââ /usr/bin/docker run
--cpu-shares=10 --memory=3072m --cap-drop=all --hostname
mschnepf-15764.0-worker1 --name HTCJob15764_0_slot1_7_P
ÂÂ 0.00 B/s 18614 condorÂÂÂÂ 20ÂÂ 0 68168Â 6724Â 5420
S 0.0 0.0 0:05.22 â â ââ condor_starter -f -a slot1_6 schedd1
This happened to all jobs on a machine.
The used HTCondor version is 8.6.5 Aug 05 2017 BuildID: 412177
and docker version 17.05.0-ce, build 89658be. All machines are
CentOS 7 machines with Kernel 3.10.0-693.11.6.el7.x86_64. I
installed on one machine the mainline kernel
4.18.7-1.el7.elrepo.x86_64. However, it happens also on the
mainline kernel machine.
When this happened, commands such as docker ps hangs. After a
restart of the docker daemon, it works for a while.
Has someone the same or similar problems and a solution?
Cheers and thanks,
Matthias