On 11/03/19 15:45, Brian Lin wrote:
That's curious, do you
see any errors in /etc/condor/CollectorLog on
htc-2.cr.cnaf.infn.it?
Yes, see below.
What's `condor_config_val COLLECTOR_HOST` return
[root@htc-2 condor]# condor_config_val COLLECTOR_HOST
htc-2.cr.cnaf.infn.it
on the CE? How about `condor_status -schedd` on the central manager?
#this very moment the cluster is quite screwed and the CM does not
start. (CEDAR:6001:Failed to connect to <131.154.195.32:9618>)
(downgraded and upgraded again, neutralizing configurations from
puppet classes. )
Thanks,
Brian
I raised log verbosity; my understanding (see logs below) is that the
JobRouter at ce02-htc fails to authenticate with CM at htc-2
because it attempts FS method, which fails because they have no common
filesystem.
The SEC_*AUTHENTICATION_METHODS (and most of other settings) seems to
be equivalent with the other cluster.
I tried adding the PASSWORD method: SEC_*_AUTHENTICATION_METHODS =
..., PASSWORD
but it didn't work; maybe i missed the right combination, though.
The IP in the logs are:
(131.154.195.32 == htc-2.cr.cnaf.infn.it)
(131.154.192.41 == ce02-htc.cr.cnaf.infn.it)
From JobRouterLog at ce02-htc:
03/11/19 07:13:28 (D_ALWAYS:2) Will use TCP to update collector
htc-2.cr.cnaf.infn.it <131.154.195.32:9618>
03/11/19 07:13:28 (D_ALWAYS:2) Trying to query collector
<131.154.195.32:9618>
03/11/19 07:13:28 (D_ALWAYS) SECMAN: required authentication with
collector at <131.154.195.32:9618> failed, so aborting command
QUERY_SCHEDD_ADS.
03/11/19 07:13:28 (D_ALWAYS) ERROR: AUTHENTICATE:1003:Failed to
authenticate with any method|AUTHENTICATE:1004:Failed to authenticate
using FS
03/11/19 07:13:28 (D_ALWAYS) ERROR (pool htc-2.cr.cnaf.infn.it:9618)
Can't find address of schedd
03/11/19 07:13:28 (D_ALWAYS) JobRouter failure
(src=320.0,route=condor_pool_cms): failed to submit job
CollectorLog at htc-2.cr.cnaf.infn.it:
03/11/19 07:13:39 SECMAN: new session, doing initial authentication.
03/11/19 07:13:39 Returning to DC while we wait for socket to
authenticate.
03/11/19 07:13:39 AUTHENTICATE: setting timeout for (unknown) to 20.
03/11/19 07:13:39 HANDSHAKE: in handshake(my_methods = 'FS')
03/11/19 07:13:39 HANDSHAKE: handshake() - i am the server
03/11/19 07:13:39 HANDSHAKE: client sent (methods == 4)
03/11/19 07:13:39 HANDSHAKE: i picked (method == 4)
03/11/19 07:13:39 HANDSHAKE: client received (method == 4)
03/11/19 07:13:39 FS: client template is /tmp/FS_XXXXXXXXX
03/11/19 07:13:39 FS: client filename is /tmp/FS_XXXU3AGXf
03/11/19 07:13:39 Will return to DC because authentication is incomplete.
03/11/19 07:13:39 AUTHENTICATE_FS: used dir /tmp/FS_XXXU3AGXf, status: 0
03/11/19 07:13:39 AUTHENTICATE: method -1 (FS) failed.
03/11/19 07:13:39 HANDSHAKE: in handshake(my_methods = 'FS')
03/11/19 07:13:39 AUTHENTICATE: handshake would block
03/11/19 07:13:39 Will return to DC to continue authentication..
03/11/19 07:13:39 HANDSHAKE: handshake() - i am the server
03/11/19 07:13:39 HANDSHAKE: client sent (methods == 0)
03/11/19 07:13:39 HANDSHAKE: i picked (method == 0)
03/11/19 07:13:39 HANDSHAKE: client received (method == 0)
03/11/19 07:13:39 DC_AUTHENTICATE: required authentication of
131.154.192.41 failed: AUTHENTICATE:1003:Failed to authenticate with
any method|AUTHENT
ICATE:1004:Failed to authenticate using FS|FS:1004:Unable to
lstat(/tmp/FS_XXXU3AGXf)
03/11/19 07:13:39 DC_AUTHENTICATE: received DC_AUTHENTICATE from
<131.154.192.41:12036>
03/11/19 07:13:39 DC_AUTHENTICATE: generating BLOWFISH key for session
htc-2:13943:1552284819:2284...
Thanks for your help
Stefano