Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Endless running docker jobs
- Date: Thu, 11 Jan 2018 11:38:28 +0100
- From: Matthias Schnepf <matthias.schnepf@xxxxxxx>
- Subject: [HTCondor-users] Endless running docker jobs
Hi all,
we running successfully a the docker universe on a lot of our resources.
Sometimes it happens, that the job in the docker container is finished
but HTCondor doesn't recognize this. Sometimes, HTCondor loses the
information about the PID and changes the executable (program)Â from
docker:./condor_exec.exe to the job ID. This results in an endless
running docker job.
condor_who job running:
OWNERÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ CLIENTÂÂÂÂÂÂÂÂÂÂ SLOTÂÂÂÂ JOB RUNTIMEÂÂÂÂ ÂÂÂ
PIDÂÂÂÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ PROGRAM
userÂÂ ÂÂÂ Â ÂÂÂ ÂÂÂ ÂÂÂÂ submitnodeÂÂÂ 1_13 ÂÂÂ 2925686.0
0+02:30:11ÂÂÂÂ 11868ÂÂ ÂÂÂ ÂÂÂ Â Â ÂÂÂ docker:./condor_exec.exe
condor_who job finished:
OWNERÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ CLIENTÂÂÂÂÂÂÂÂÂÂ SLOT ÂÂÂ JOB RUNTIMEÂÂÂÂ ÂÂÂ
PIDÂÂÂÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ PROGRAM
userÂÂ ÂÂÂ Â ÂÂÂ ÂÂÂ ÂÂÂÂ submitnodeÂÂÂ 1_13 ÂÂÂ 2925686.0
0+02:30:11ÂÂÂÂÂÂÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ 2925686.0
As expected, docker ps -a shows no docker container when the job is
finished.
We run HTCondor 8.6.5 and docker 17.05.0-ce.
Is this a known issue and is there any solution?
Cheers,
Matthias