| Mailing List ArchivesAuthenticated access |  | ![[Computer Systems Lab]](http://www.cs.wisc.edu/pics/csl_logo.gif)  | 
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] 6.7.18 problem: Kerberos authentication issues	post-upgrade
- Date: Wed, 29 Mar 2006 17:04:23 +0100
- From: David McBride <dwm@xxxxxxxxxxxx>
- Subject: [Condor-users] 6.7.18 problem: Kerberos authentication issues	post-upgrade
Hi,
I have just upgraded my local Condor pool to 6.7.18 (from 6.7.16) and 
I'm running into what look like some Kerberos authentication issues.
Scenario:
========
Every machine uses the same global configuration file:
http://www.doc.ic.ac.uk/condor/doc-config/condor_config.global
(Locally retrieved from an NFS volume.)
Note the strong-authentication section at the tail of the file;  All 
condor daemons are required to authenticate using the local host keytab 
stored in /etc/krb5.keytab, and all WRITE operations must be 
authenticated with Kerberos credentials.
Two machines of note:
skimmer.doc.ic.ac.uk acts as Condor master.
lightyear.doc.ic.ac.uk acts as a submit-only node.
Both machines are running a distributed derived from Mandrake 10.2 on a 
locally-built 2.6.13 kernel; the local Kerberos packages are derived 
from MIT Kerberos 1.4.2:
# rpm -qa|grep krb
libkrb53-devel-1.4.2-0.1.102mdk
libkrbafs0-1.2.2-4mdk
libkrb53-1.4.2-0.1.102mdk
krb5-workstation-1.4.2-0.1.102mdk
libkrbafs0-devel-1.2.2-4mdk
ftp-client-krb5-1.4.2-0.1.102mdk
pam_krb5-2.1.8-1doc
telnet-client-krb5-1.4.2-0.1.102mdk
Failure case:
=============
User 'mwj' tries to submit a set of Condor jobs to the local schedd on 
lightyear.  This is successful, as they have a local kerberos TGT.
The jobs, however, never start.  Indeed, when running `condor_q -global` 
they do not appear at all, whereas they _are_ listed when queried using 
`condor_q` on lightyear itself.  This suggests a communications issue of 
some kind.
Reviewing the MasterLog on Lightyear, the following errors were displayed:
==> MasterLog <==
3/29 12:57:19 AUTHENTICATE: no available authentication methods 
succeeded, failing!
3/29 12:57:19 DC_AUTHENTICATE: authenticate failed: 
AUTHENTICATE:1003:Failed to authenticate with any 
method|AUTHENTICATE:1004:Failed to authenticate using KERBEROS
3/29 12:57:23 AUTH_ERROR: Internal credentials cache error
3/29 12:57:23 AUTHENTICATE: no available authentication methods 
succeeded, failing!
3/29 12:57:23 ERROR: SECMAN:2004:Failed to start a session with 
TCP|AUTHENTICATE:1003:Failed to authenticate with any 
method|AUTHENTICATE:1004:Failed to authenticate using KERBEROS
3/29 12:58:23 getpeername failed so connect must have failed
3/29 12:58:43 Connect failed for 20 seconds; returning FALSE
3/29 12:58:43 ERROR: SECMAN:2003:TCP connection to <146.169.1.113:9618> 
failed
3/29 12:59:43 getpeername failed so connect must have failed
3/29 13:00:03 Connect failed for 20 seconds; returning FALSE
3/29 13:00:03 ERROR: SECMAN:2003:TCP connection to <146.169.1.113:9618> 
failed
The "Internal credentials cache error" appears to be the significant 
issue here; it looks like the Master daemon on Lightyear is unable to 
mutually-authenticate with the daemons on Skimmer as a result of this 
cache problem, resulting in the observed communications breakdown.
Reconfiguring the logging to add D_SECURITY, the following fuller output 
appears on Lightyear:
==> MasterLog <==
3/29 16:45:40 STARTCOMMAND: starting 2 to <146.169.1.113:9618> on UDP 
port 47686.
3/29 16:45:40 SECMAN: command 2 to <146.169.1.113:9618> on UDP port 47686.
3/29 16:45:40 SECMAN: command 60010 to <146.169.1.113:9618> on TCP port 
43363.
3/29 16:45:40 SECMAN: new session, doing initial authentication.
3/29 16:45:40 SECMAN: Auth methods: KERBEROS
3/29 16:45:40 HANDSHAKE: in handshake(my_methods = 'KERBEROS')
3/29 16:45:40 HANDSHAKE: handshake() - i am the client
3/29 16:45:40 HANDSHAKE: sending (methods == 64) to server
3/29 16:45:40 HANDSHAKE: server replied (method = 64)
3/29 16:45:40 KERBEROS: krb5_unparse_name: 
host/skimmer.doc.ic.ac.uk@xxxxxxxxxxxx
3/29 16:45:40 KERBEROS: no user yet determined, will grab up to slash
3/29 16:45:40 KERBEROS: picked user: host
3/29 16:45:40 KERBEROS: remapping 'host' to 'condor'
3/29 16:45:40 unable to open map file (null), errno 14
3/29 16:45:40 Client is condor@(null)
3/29 16:45:40 KERBEROS: Server principal is 
host/skimmer.doc.ic.ac.uk@xxxxxxxxxxxx
3/29 16:45:40 init_daemon: client principal is 
'host/lightyear.doc.ic.ac.uk@xxxxxxxxxxxx'
3/29 16:45:40 init_daemon: Using default keytab FILE:/etc/krb5.keytab
3/29 16:45:40 AUTH_ERROR: Internal credentials cache error
3/29 16:45:40 AUTHENTICATE: method 64 (KERBEROS) failed.
3/29 16:45:40 HANDSHAKE: in handshake(my_methods = '')
3/29 16:45:40 HANDSHAKE: handshake() - i am the client
3/29 16:45:40 HANDSHAKE: sending (methods == 0) to server
3/29 16:45:40 HANDSHAKE: server replied (method = 0)
3/29 16:45:40 AUTHENTICATE: no available authentication methods 
succeeded, failing!
3/29 16:45:40 SECMAN: unable to start session via TCP, failing.
3/29 16:45:40 ERROR: SECMAN:2004:Failed to start a session with 
TCP|AUTHENTICATE:1003:Failed to authenticate with any 
method|AUTHENTICATE:1004:Failed to authenticate using KERBEROS
It looks like it either cannot determine its local identity properly 
(note the "Client is condor@(null)" entry) or it is unable to process 
the local /etc/krb5.keytab file properly -- perhaps it is attempting to 
do so as the local 'condor' user, and not as root?
Any assistance with this issue would be greatly appreciated.
Cheers,
David
--
David McBride <dwm@xxxxxxxxxxxx>
Department of Computing, Imperial College, London