[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] htcondor/execute container cannot connect to central manager



Hi again,

I have attached two logs in the previous email. The second log is the log of running the execute process, which connects to the central manager without any problem and condor_status shows the worker. The first log is the log when I used the htcondor/execute container on the same machine, which cannot connect to the central manager. The red lines are the difference between the two runs. The docker run uses the exact same condor configuration and also uses the host network, which eliminates the networking problems. My question is what is causing this problem and how to debug it.
You are right, the container run spawns the processes, but stuck somewhere in the runtime and does not progress.

Kind regards,
Reza

Am 27.09.2023 um 17:49 schrieb John M Knoeller via HTCondor-users <htcondor-users@xxxxxxxxxxx>:

You say
 
> container-based execution does not spawn the required processes
 
Does that mean that the HTCondor daemons arenât running at all? 
 
But then you post log snippets that seem to show that the daemons are in fact running. 
 
09/19/23 11:14:11 (D_ALWAYS) DaemonCore: command socket at <192.168.56.101:33407?addrs=192.168.56.101-33407&alias=wor>

The IP address 192.168.56.101 makes me a bit concerned, since I know that is one of the private IP ranges.   Is your Central Manager also on the 192.168.56 subnet?
 
You also show some messages from the hibernation daemon in red, but I donât see anything that that suggests a failure of any kind.
 
-tj
 
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Salkhordehhaghighi, Reza
Sent: Friday, September 22, 2023 5:02 AM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] htcondor/execute container cannot connect to central manager
 

Hi all,

 

I have been trying to use the htcondor/execute container to connect to a central manager with minimum config. After many attempts, container-based execution does not spawn the required processes. Running the execute using normal service works, but giving the same config to the htcondor/execute does not work. 


Here is the command I use. I gave the exact same config as my working service to the container. Using the examples inhttps://github.com/htcondor/htcondor/tree/master/build/docker/services also doesn't work. I have used both el8 and ubuntu containers, both not working.

 

docker run --rm --network host --env-file=env --name condor -v /etc/condor:/etc/condor htcondor/execute

 

cat /etc/redhat-release 
AlmaLinux release 9.2 (Turquoise Kodkod)

 

 

Here is the log file when using the container:

 

root@worker0:/var/log/condor# cat StartLog 
09/19/23 10:07:25 (D_ALWAYS:2) Result of reading /etc/issue:  Ubuntu 20.04.4 LTS \n \l
 
09/19/23 10:07:25 (D_ALWAYS:2) Using IDs: 1 processors, 1 CPUs, 0 HTs
09/19/23 10:07:25 (D_ALWAYS:2) Reading condor configuration from '/etc/condor/condor_config'
09/19/23 10:07:25 (D_ALWAYS:2) Enumerating interfaces: lo 127.0.0.1 up
09/19/23 10:07:25 (D_ALWAYS:2) Enumerating interfaces: enp0s3 10.0.2.15 up
09/19/23 10:07:25 (D_ALWAYS:2) Enumerating interfaces: enp0s8 192.168.56.101 up
09/19/23 10:07:25 (D_ALWAYS:2) Enumerating interfaces: docker0 172.17.0.1 up
09/19/23 10:07:25 (D_ALWAYS:2) Enumerating interfaces: lo ::1 up
09/19/23 10:07:25 (D_ALWAYS:2) Enumerating interfaces: enp0s3 fe80::a00:27ff:fe5c:373e up
09/19/23 10:07:25 (D_ALWAYS:2) Enumerating interfaces: enp0s8 fe80::4f26:c:cb9d:5de4 up
09/19/23 10:07:25 (D_ALWAYS:2) Enumerating interfaces: docker0 fe80::42:56ff:fe11:aeef up
09/19/23 10:07:25 (D_ALWAYS) ******************************************************
09/19/23 10:07:25 (D_ALWAYS) ** condor_startd (CONDOR_STARTD) STARTING UP
09/19/23 10:07:25 (D_ALWAYS) ** /usr/sbin/condor_startd
09/19/23 10:07:25 (D_ALWAYS) ** SubsystemInfo: name=STARTD type=STARTD(6) class=DAEMON(1)
09/19/23 10:07:25 (D_ALWAYS) ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
09/19/23 10:07:25 (D_ALWAYS) ** $CondorVersion: 10.1.1 2022-11-10 BuildID: 612938 PackageID: 10.1.1-1.1 RC $
09/19/23 10:07:25 (D_ALWAYS) ** $CondorPlatform: X86_64-Ubuntu_20.04 $
09/19/23 10:07:25 (D_ALWAYS) ** PID = 1
09/19/23 10:07:25 (D_ALWAYS) ** Log last touched time unavailable (No such file or directory)
09/19/23 10:07:25 (D_ALWAYS) ******************************************************
09/19/23 10:07:25 (D_ALWAYS) Using config source: /etc/condor/condor_config
09/19/23 10:07:25 (D_ALWAYS) Using local config sources: 
09/19/23 10:07:25 (D_ALWAYS)    /etc/condor/config.d/01-env.conf
09/19/23 10:07:25 (D_ALWAYS)    /etc/condor/config.d/02-execute.config
09/19/23 10:07:25 (D_ALWAYS)    /etc/condor/config.d/10-stash-plugin.conf
09/19/23 10:07:25 (D_ALWAYS)    /etc/condor/condor_config.local
09/19/23 10:07:25 (D_ALWAYS) config Macros = 71, Sorted = 71, StringBytes = 1912, TablesBytes = 2620
09/19/23 10:07:25 (D_ALWAYS) CLASSAD_CACHING is ENABLED
09/19/23 10:07:25 (D_ALWAYS) Daemon Log is logging: D_ALWAYS:2 D_ERROR D_STATUS
09/19/23 10:07:25 (D_ALWAYS:2) Not using shared port because USE_SHARED_PORT=false
09/19/23 10:07:25 (D_ALWAYS) Daemoncore: Listening at <0.0.0.0:44747> on TCP (ReliSock) and UDP (SafeSock).
09/19/23 10:07:25 (D_ALWAYS) DaemonCore: command socket at <192.168.56.101:44747?addrs=192.168.56.101-44747&alias=worker0>
09/19/23 10:07:25 (D_ALWAYS) DaemonCore: private command socket at <192.168.56.101:44747?addrs=192.168.56.101-44747&alias=worker0>
09/19/23 10:07:25 (D_ALWAYS:2) Setting maximum accepts per cycle 8.
09/19/23 10:07:25 (D_ALWAYS:2) Setting maximum UDP messages per cycle 100.
09/19/23 10:07:25 (D_ALWAYS:2) Will use TCP to update collector <192.168.56.1:9618>
09/19/23 10:07:25 (D_ALWAYS:2) Not using shared port because USE_SHARED_PORT=false
09/19/23 10:07:25 (D_ALWAYS:2) Memory: Detected 1024 megs RAM
09/19/23 10:07:25 (D_ALWAYS:2) Found interface enp0s8 that matches <192.168.56.101:0>
09/19/23 10:07:25 (D_ALWAYS:2) Found interface enp0s8 with ip 192.168.56.101
09/19/23 10:07:25 (D_ALWAYS:2) enp0s8 supports Wake-on: no (raw: 0x00)
09/19/23 10:07:25 (D_ALWAYS:2) enp0s8 enabled Wake-on: no (raw: 0x00)
09/19/23 10:07:25 (D_ALWAYS:2) Using network interface enp0s8 for hibernation

====================================================================================================

And here is the log file of when using standard service. The red lines are not written in the container log above, so I suspect something is stuck at this stage.

 

09/19/23 11:14:11 (D_ALWAYS:2) Result of reading /etc/issue:  \S

09/19/23 11:14:11 (D_ALWAYS:2) Result of reading /etc/redhat-release:  AlmaLinux release 9.2 (Turquoise Kodkod)

09/19/23 11:14:11 (D_ALWAYS:2) Using IDs: 1 processors, 1 CPUs, 0 HTs
09/19/23 11:14:11 (D_ALWAYS:2) Reading condor configuration from '/etc/condor/condor_config'
09/19/23 11:14:11 (D_ALWAYS:2) Enumerating interfaces: lo 127.0.0.1 up
09/19/23 11:14:11 (D_ALWAYS:2) Enumerating interfaces: enp0s3 10.0.2.15 up
09/19/23 11:14:11 (D_ALWAYS:2) Enumerating interfaces: enp0s8 192.168.56.101 up
09/19/23 11:14:11 (D_ALWAYS:2) Enumerating interfaces: docker0 172.17.0.1 up
09/19/23 11:14:11 (D_ALWAYS:2) Enumerating interfaces: lo ::1 up
09/19/23 11:14:11 (D_ALWAYS:2) Enumerating interfaces: enp0s3 fe80::a00:27ff:fe5c:373e up
09/19/23 11:14:11 (D_ALWAYS:2) Enumerating interfaces: enp0s8 fe80::4f26:c:cb9d:5de4 up
09/19/23 11:14:11 (D_ALWAYS:2) Enumerating interfaces: docker0 fe80::42:56ff:fe11:aeef up
09/19/23 11:14:11 (D_ALWAYS) ******************************************************
09/19/23 11:14:11 (D_ALWAYS) ** condor_startd (CONDOR_STARTD) STARTING UP
09/19/23 11:14:11 (D_ALWAYS) ** /usr/sbin/condor_startd
09/19/23 11:14:11 (D_ALWAYS) ** SubsystemInfo: name=STARTD type=STARTD(6) class=DAEMON(1)
09/19/23 11:14:11 (D_ALWAYS) ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
09/19/23 11:14:11 (D_ALWAYS) ** $CondorVersion: 10.7.0 2023-07-31 BuildID: 665155 PackageID: 10.7.0-1 $
09/19/23 11:14:11 (D_ALWAYS) ** $CondorPlatform: x86_64_AlmaLinux9 $
09/19/23 11:14:11 (D_ALWAYS) ** PID = 334599
09/19/23 11:14:11 (D_ALWAYS) ** Log last touched time unavailable (No such file or directory)
09/19/23 11:14:11 (D_ALWAYS) ******************************************************

9/19/23 11:14:11 (D_ALWAYS) Using config source: /etc/condor/condor_config
09/19/23 11:14:11 (D_ALWAYS) Using local config sources:
09/19/23 11:14:11 (D_ALWAYS)    /etc/condor/config.d/01-env.conf
09/19/23 11:14:11 (D_ALWAYS)    /etc/condor/config.d/02-execute.config
09/19/23 11:14:11 (D_ALWAYS)    /etc/condor/config.d/10-stash-plugin.conf
09/19/23 11:14:11 (D_ALWAYS)    /etc/condor/condor_config.local
09/19/23 11:14:11 (D_ALWAYS) config Macros = 73, Sorted = 73, StringBytes = 2019, TablesBytes = 2692
09/19/23 11:14:11 (D_ALWAYS) CLASSAD_CACHING is ENABLED
09/19/23 11:14:11 (D_ALWAYS) Daemon Log is logging: D_ALWAYS:2 D_ERROR D_STATUS
09/19/23 11:14:11 (D_ALWAYS:2) Internal pipe for signals resized to 4096 from 65536
09/19/23 11:14:11 (D_ALWAYS:2) Not using shared port because USE_SHARED_PORT=false
09/19/23 11:14:11 (D_ALWAYS) Daemoncore: Listening at <0.0.0.0:33407> on TCP (ReliSock) and UDP (SafeSock).
09/19/23 11:14:11 (D_ALWAYS) DaemonCore: command socket at <192.168.56.101:33407?addrs=192.168.56.101-33407&alias=wor>
09/19/23 11:14:11 (D_ALWAYS) DaemonCore: private command socket at <192.168.56.101:33407?addrs=192.168.56.101-33407&a>
09/19/23 11:14:11 (D_ALWAYS:2) Setting maximum accepts per cycle 8.
09/19/23 11:14:11 (D_ALWAYS:2) Setting maximum UDP messages per cycle 100.
09/19/23 11:14:11 (D_ALWAYS:2) Will use TCP to update collector <192.168.56.1:9618>
09/19/23 11:14:11 (D_ALWAYS:2) Not using shared port because USE_SHARED_PORT=false
09/19/23 11:14:11 (D_ALWAYS:2) Memory: Detected 1024 megs RAM
09/19/23 11:14:11 (D_ALWAYS:2) Found interface enp0s8 that matches <192.168.56.101:0>
09/19/23 11:14:11 (D_ALWAYS:2) Found interface enp0s8 with ip 192.168.56.101
09/19/23 11:14:11 (D_ALWAYS:2) enp0s8 supports Wake-on: yes (raw: 0x2e)
09/19/23 11:14:11 (D_ALWAYS:2) enp0s8 enabled Wake-on: no (raw: 0x00)
09/19/23 11:14:11 (D_ALWAYS:2) Using network interface enp0s8 for hibernation
09/19/23 11:14:11 (D_ALWAYS:2) Initially invoking hibernation plugin '/usr/libexec/condor/condor_power_state ad'
09/19/23 11:14:11 (D_ALWAYS:2) Detected hibernation states: S3,S4,S5
09/19/23 11:14:18 (D_ALWAYS) VM universe will be tested to check if it is available
09/19/23 11:14:18 (D_ALWAYS) History file rotation is enabled.

Kind regards,
Reza
 

 

 

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/