Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Intermittent problems in a new pool installation spanning two subnets
- Date: Mon, 04 Dec 2023 09:19:17 +0100
- From: Valerio Bellizzomi <valerio@xxxxxxxxxx>
- Subject: [HTCondor-users] Intermittent problems in a new pool installation spanning two subnets
Hi,
I have been able to install 23.2.0 in a new pool spanning two subnets,
but there are intermittent communication problems. I attach the logs
below.
The ep1ext is on a different subnet than that ofthe central manager.
Of course I have reviewed firewall rules to allow this traffic in and
out, but still, there are intermittent errors.
VB
# tail -30 /var/log/condor/MasterLog
12/04/23 08:19:09 ERROR: SECMAN:2007:Failed to read resume session response classad from server.
12/04/23 08:19:09 Failed to start non-blocking update to <10.10.0.30:9618>.
12/04/23 08:19:16 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 14959
12/04/23 08:19:16 condor_read(): Socket closed abnormally when trying to read 5 bytes from collector htcondor.sel in non-blocking mode, errno=104 Connection reset by peer
12/04/23 08:19:16 SECMAN: Failed to read resume session response classad from server.
12/04/23 08:19:16 ERROR: SECMAN:2007:Failed to read resume session response classad from server.
12/04/23 08:19:16 Failed to start non-blocking update to <10.10.0.30:9618>.
12/04/23 08:19:19 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 14960
12/04/23 08:27:16 The VIEW_SERVER (pid 14959) exited with status 4
12/04/23 08:27:16 Sending obituary for "/usr/sbin/condor_collector"
12/04/23 08:27:16 my_popenv: Failed to exec /usr/bin/mail, errno=2 (No such file or directory)
12/04/23 08:27:16 Failed to launch mailer process: /usr/bin/mail
12/04/23 08:27:16 restarting /usr/sbin/condor_collector in 10 seconds
12/04/23 08:27:19 The COLLECTOR (pid 14960) exited with status 4
12/04/23 08:27:19 Sending obituary for "/usr/sbin/condor_collector"
12/04/23 08:27:19 my_popenv: Failed to exec /usr/bin/mail, errno=2 (No such file or directory)
12/04/23 08:27:19 Failed to launch mailer process: /usr/bin/mail
12/04/23 08:27:19 restarting /usr/sbin/condor_collector in 10 seconds
12/04/23 08:27:19 condor_write(): Socket closed when trying to write 2417 bytes to collector htcondor.sel, fd is 10
12/04/23 08:27:19 Buf::write(): condor_write() failed
12/04/23 08:27:19 condor_read(): Socket closed abnormally when trying to read 5 bytes from collector htcondor.sel in non-blocking mode, errno=104 Connection reset by peer
12/04/23 08:27:19 SECMAN: Failed to read resume session response classad from server.
12/04/23 08:27:19 ERROR: SECMAN:2007:Failed to read resume session response classad from server.
12/04/23 08:27:19 Failed to start non-blocking update to <10.10.0.30:9618>.
12/04/23 08:27:26 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 14990
12/04/23 08:27:26 condor_read(): Socket closed abnormally when trying to read 5 bytes from collector htcondor.sel in non-blocking mode, errno=104 Connection reset by peer
12/04/23 08:27:26 SECMAN: Failed to read resume session response classad from server.
12/04/23 08:27:26 ERROR: SECMAN:2007:Failed to read resume session response classad from server.
12/04/23 08:27:26 Failed to start non-blocking update to <10.10.0.30:9618>.
12/04/23 08:27:29 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 14991
# tail -30 MasterLog
12/04/23 07:41:00 ERROR: SECMAN:2004:Server rejected our session id
12/04/23 07:41:00 Failed to start non-blocking update to <10.10.0.30:9618>.
12/04/23 07:51:00 condor_write(): Socket closed when trying to write 2178 bytes to collector htcondor.sel, fd is 10
12/04/23 07:51:00 Buf::write(): condor_write() failed
12/04/23 07:51:00 SECMAN: Server rejected our session id
12/04/23 07:51:00 SECMAN: Invalidating negotiated session rejected by peer
12/04/23 07:51:00 ERROR: SECMAN:2004:Server rejected our session id
12/04/23 07:51:00 Failed to start non-blocking update to <10.10.0.30:9618>.
12/04/23 08:06:00 condor_write(): Socket closed when trying to write 2160 bytes to collector htcondor.sel, fd is 10
12/04/23 08:06:00 Buf::write(): condor_write() failed
12/04/23 08:06:00 SECMAN: Server rejected our session id
12/04/23 08:06:00 SECMAN: Invalidating negotiated session rejected by peer
12/04/23 08:06:00 ERROR: SECMAN:2004:Server rejected our session id
12/04/23 08:06:00 Failed to start non-blocking update to <10.10.0.30:9618>.
12/04/23 08:11:00 condor_read(): Socket closed abnormally when trying to read 5 bytes from collector htcondor.sel in non-blocking mode, errno=104 Connection reset by peer
12/04/23 08:11:00 SECMAN: no classad from server, failing
12/04/23 08:11:00 ERROR: SECMAN:2007:Failed to end classad message.
12/04/23 08:11:00 Failed to start non-blocking update to <10.10.0.30:9618>.
12/04/23 08:21:00 condor_write(): Socket closed when trying to write 2178 bytes to collector htcondor.sel, fd is 10
12/04/23 08:21:00 Buf::write(): condor_write() failed
12/04/23 08:21:00 SECMAN: Server rejected our session id
12/04/23 08:21:00 SECMAN: Invalidating negotiated session rejected by peer
12/04/23 08:21:00 ERROR: SECMAN:2004:Server rejected our session id
12/04/23 08:21:00 Failed to start non-blocking update to <10.10.0.30:9618>.
12/04/23 08:31:00 condor_write(): Socket closed when trying to write 2178 bytes to collector htcondor.sel, fd is 10
12/04/23 08:31:00 Buf::write(): condor_write() failed
12/04/23 08:31:00 SECMAN: Server rejected our session id
12/04/23 08:31:00 SECMAN: Invalidating negotiated session rejected by peer
12/04/23 08:31:00 ERROR: SECMAN:2004:Server rejected our session id
12/04/23 08:31:00 Failed to start non-blocking update to <10.10.0.30:9618>.
# condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
slot1@xxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 32130 0+11:09:39
Total Owner Claimed Unclaimed Matched Preempting Drain Backfill BkIdle
X86_64/LINUX 1 0 0 1 0 0 0 0 0
Total 1 0 0 1 0 0 0 0 0
# condor_status -master
Name Version Cpus Memory Uptime
ep1ext.sel 23.2.0.Package 12 31.4 GB 0+11:10:06
htcondor.sel 23.2.0.Package 24 31.4 GB 0+11:09:47