Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] 6.7.18 problem: Kerberos authentication issues post-upgrade
- Date: Wed, 29 Mar 2006 17:04:23 +0100
- From: David McBride <dwm@xxxxxxxxxxxx>
- Subject: [Condor-users] 6.7.18 problem: Kerberos authentication issues post-upgrade
Hi,
I have just upgraded my local Condor pool to 6.7.18 (from 6.7.16) and
I'm running into what look like some Kerberos authentication issues.
Scenario:
========
Every machine uses the same global configuration file:
http://www.doc.ic.ac.uk/condor/doc-config/condor_config.global
(Locally retrieved from an NFS volume.)
Note the strong-authentication section at the tail of the file; All
condor daemons are required to authenticate using the local host keytab
stored in /etc/krb5.keytab, and all WRITE operations must be
authenticated with Kerberos credentials.
Two machines of note:
skimmer.doc.ic.ac.uk acts as Condor master.
lightyear.doc.ic.ac.uk acts as a submit-only node.
Both machines are running a distributed derived from Mandrake 10.2 on a
locally-built 2.6.13 kernel; the local Kerberos packages are derived
from MIT Kerberos 1.4.2:
# rpm -qa|grep krb
libkrb53-devel-1.4.2-0.1.102mdk
libkrbafs0-1.2.2-4mdk
libkrb53-1.4.2-0.1.102mdk
krb5-workstation-1.4.2-0.1.102mdk
libkrbafs0-devel-1.2.2-4mdk
ftp-client-krb5-1.4.2-0.1.102mdk
pam_krb5-2.1.8-1doc
telnet-client-krb5-1.4.2-0.1.102mdk
Failure case:
=============
User 'mwj' tries to submit a set of Condor jobs to the local schedd on
lightyear. This is successful, as they have a local kerberos TGT.
The jobs, however, never start. Indeed, when running `condor_q -global`
they do not appear at all, whereas they _are_ listed when queried using
`condor_q` on lightyear itself. This suggests a communications issue of
some kind.
Reviewing the MasterLog on Lightyear, the following errors were displayed:
==> MasterLog <==
3/29 12:57:19 AUTHENTICATE: no available authentication methods
succeeded, failing!
3/29 12:57:19 DC_AUTHENTICATE: authenticate failed:
AUTHENTICATE:1003:Failed to authenticate with any
method|AUTHENTICATE:1004:Failed to authenticate using KERBEROS
3/29 12:57:23 AUTH_ERROR: Internal credentials cache error
3/29 12:57:23 AUTHENTICATE: no available authentication methods
succeeded, failing!
3/29 12:57:23 ERROR: SECMAN:2004:Failed to start a session with
TCP|AUTHENTICATE:1003:Failed to authenticate with any
method|AUTHENTICATE:1004:Failed to authenticate using KERBEROS
3/29 12:58:23 getpeername failed so connect must have failed
3/29 12:58:43 Connect failed for 20 seconds; returning FALSE
3/29 12:58:43 ERROR: SECMAN:2003:TCP connection to <146.169.1.113:9618>
failed
3/29 12:59:43 getpeername failed so connect must have failed
3/29 13:00:03 Connect failed for 20 seconds; returning FALSE
3/29 13:00:03 ERROR: SECMAN:2003:TCP connection to <146.169.1.113:9618>
failed
The "Internal credentials cache error" appears to be the significant
issue here; it looks like the Master daemon on Lightyear is unable to
mutually-authenticate with the daemons on Skimmer as a result of this
cache problem, resulting in the observed communications breakdown.
Reconfiguring the logging to add D_SECURITY, the following fuller output
appears on Lightyear:
==> MasterLog <==
3/29 16:45:40 STARTCOMMAND: starting 2 to <146.169.1.113:9618> on UDP
port 47686.
3/29 16:45:40 SECMAN: command 2 to <146.169.1.113:9618> on UDP port 47686.
3/29 16:45:40 SECMAN: command 60010 to <146.169.1.113:9618> on TCP port
43363.
3/29 16:45:40 SECMAN: new session, doing initial authentication.
3/29 16:45:40 SECMAN: Auth methods: KERBEROS
3/29 16:45:40 HANDSHAKE: in handshake(my_methods = 'KERBEROS')
3/29 16:45:40 HANDSHAKE: handshake() - i am the client
3/29 16:45:40 HANDSHAKE: sending (methods == 64) to server
3/29 16:45:40 HANDSHAKE: server replied (method = 64)
3/29 16:45:40 KERBEROS: krb5_unparse_name:
host/skimmer.doc.ic.ac.uk@xxxxxxxxxxxx
3/29 16:45:40 KERBEROS: no user yet determined, will grab up to slash
3/29 16:45:40 KERBEROS: picked user: host
3/29 16:45:40 KERBEROS: remapping 'host' to 'condor'
3/29 16:45:40 unable to open map file (null), errno 14
3/29 16:45:40 Client is condor@(null)
3/29 16:45:40 KERBEROS: Server principal is
host/skimmer.doc.ic.ac.uk@xxxxxxxxxxxx
3/29 16:45:40 init_daemon: client principal is
'host/lightyear.doc.ic.ac.uk@xxxxxxxxxxxx'
3/29 16:45:40 init_daemon: Using default keytab FILE:/etc/krb5.keytab
3/29 16:45:40 AUTH_ERROR: Internal credentials cache error
3/29 16:45:40 AUTHENTICATE: method 64 (KERBEROS) failed.
3/29 16:45:40 HANDSHAKE: in handshake(my_methods = '')
3/29 16:45:40 HANDSHAKE: handshake() - i am the client
3/29 16:45:40 HANDSHAKE: sending (methods == 0) to server
3/29 16:45:40 HANDSHAKE: server replied (method = 0)
3/29 16:45:40 AUTHENTICATE: no available authentication methods
succeeded, failing!
3/29 16:45:40 SECMAN: unable to start session via TCP, failing.
3/29 16:45:40 ERROR: SECMAN:2004:Failed to start a session with
TCP|AUTHENTICATE:1003:Failed to authenticate with any
method|AUTHENTICATE:1004:Failed to authenticate using KERBEROS
It looks like it either cannot determine its local identity properly
(note the "Client is condor@(null)" entry) or it is unable to process
the local /etc/krb5.keytab file properly -- perhaps it is attempting to
do so as the local 'condor' user, and not as root?
Any assistance with this issue would be greatly appreciated.
Cheers,
David
--
David McBride <dwm@xxxxxxxxxxxx>
Department of Computing, Imperial College, London