Subject: [HTCondor-users] HTCondor 8.6.8, 8.6.9 and 8.7.5 Job Run Error: Create_Process failed to register the job with the ProcD
Hi,
ÂI'm building a testbed with Docker and HTCondor, I setup 2 nodes, 1 MasterSubmit and an Execute, all the installation is run from root user (in the container), and later a submit user is created, no errors are shown in the installation or in the condor_submit, but when the jobs start execution, I get this error in the Job's log file:
Error from slot1_1@xxxxxxxxxxxx: Create_Process failed to register the job with the ProcD
0Â -Â Run Bytes Sent By Job
1037713Â -Â Run Bytes Received By Job
It's weird due to with HTCondor releases 8.4.8 and 8.4.12, every thing works great, no errors, jobs run and finish, but I tryed from 8.6.8 until 8.7.5 and all of them return that error (exactly that same error)
The pool's config is:
Base Docker Container: Ubuntu 16.04
1 Master/Submit node in Docker container IP: 172.17.0.2
1 Execute node in Docker containerÂ
ÂIP: 172.17.0.3
Both containers share this /etc/hosts file for name resolution:
02/22/18 18:30:33 (1.0) (4832): File transfer completed successfully.
02/22/18 18:30:34 (1.0) (4832): ERROR "Error from slot1_1@xxxxxxxxxxxx: Create_Process failed to register the job with the ProcD" at line 608 in file /slots/01/dir_1624282/sources/src/condor_shadow.V6.1/pseudo_ops.cpp
02/22/18 18:30:34 (pid:4819) Renice expr "0" evaluated to 0
02/22/18 18:30:34 (pid:4819) About to exec /var/lib/condor/execute/dir_4819/condor_exec.exe test.bash 61
02/22/18 18:30:34 (pid:4819) Running job as user same uid as parent: personal condor
02/22/18 18:30:34 (pid:4823) Result of "track_family_via_cgroup" operation from ProcD: ERROR: No cgroup available for tracking
02/22/18 18:30:34 (pid:4823) Create_Process: error tracking family with root 4823 via cgroup htcondor/condor_var_lib_condor_execute_slot1_1@xxxxxxxxxxxx
02/22/18 18:30:34 (pid:4819) Create_Process(/var/lib/condor/execute/dir_4819/condor_exec.exe): child failed because it failed to register itself with the ProcD
02/22/18 18:30:34 (pid:4819) ERROR "Create_Process failed to register the job with the ProcD" at line 632 in file /slots/01/dir_1624282/sources/src/condor_starter.V6.1/os_proc.cpp
02/22/18 18:30:34 (pid:4819) ShutdownFast all jobs.
02/22/18 18:30:34 (pid:4819) condor_read() failed: recv(fd=11) returned -1, errno = 104 Connection reset by peer, reading 5 bytes from <172.17.0.2:34841>.
02/22/18 18:30:34 (pid:4819) IO: Failed to read packet header
02/22/18 18:30:34 (pid:4819) Lost connection to shadow, waiting 2400 secs for reconnect
This is the condor_config.local file for MasterNode:
##### VALORES AGREGADOS POR htconfig_v2.py el dia: 13/02/2018 18:38:58 #####