Subject: [HTCondor-users] condor_submit_dag fails with authentication failure
Dear all,
Submitting jobs on my cluster works fine, but submitted DAGs fail
with what looks like an authentication failure.
I've tested with the simplest possible DAG ("JOB test test.job") and
the simplest possible job ("executable = /bin/hostname"), submitted
by the same user, on the same machine. The job goes well, but the
DAG fails with this in the logs:
In test.dag.dagman.out:
04/07/22 03:14:20 Submitting HTCondor Node
test job(s)...
04/07/22 03:14:20 Submitting node test from file test.job using
direct job submission
04/07/22 03:14:20 AUTH_ERROR: Generic preauthentication failure
04/07/22 03:14:20 SECMAN: required authentication with local
schedd failed, so aborting command QMGMT_WRITE_CMD.
04/07/22 03:14:20 Can't connect to queue manager:
AUTHENTICATE:1003:Failed to authenticate with any
method|AUTHENTICATE:1004:Failed to authenticate using KERBEROS
And the SchedLog has:
04/07/22 03:14:20 (pid:1072) DC_AUTHENTICATE:
authentication of <x.x.x.x:43467> did not result in a valid
mapped user name, which is required for this command (1112
QMGMT_WRITE_CMD), so aborting.
04/07/22 03:14:20 (pid:1072) DC_AUTHENTICATE: reason for
authentication failure: AUTHENTICATE:1003:Failed to authenticate
with any method|AUTHENTICATE:1004:Failed to authenticate using
KERBEROS
Our authentication is all Kerberos (FreeIPA) and works well across
the cluster. The x.x.x.x is the IP of the local machine. The user's
credentials are OK: 'condor_submit test.job' right after the failure
runs fine.
When I run with debug logging, I see this in the lead up to the
above:
04/07/22 03:46:03 (D_SECURITY) SECMAN: new
session, doing initial authentication.
04/07/22 03:46:03 (fd:7) (pid:287909) (D_SECURITY) SECMAN: Auth
methods: KERBEROS
04/07/22 03:46:03 (D_SECURITY) AUTHENTICATE: setting timeout for
<x.x.x.x:9618?addrs=x.x.x.x-9618&alias=crick.my.domain&noUDP&sock=schedd_968_8bc4>
to 20.
04/07/22 03:46:03 (D_SECURITY) HANDSHAKE: in handshake(my_methods
= 'KERBEROS')
04/07/22 03:46:03 (D_SECURITY) HANDSHAKE: handshake() - i am the
client
04/07/22 03:46:03 (D_SECURITY) HANDSHAKE: server replied (method =
64)
04/07/22 03:46:03 (D_SECURITY) KERBEROS: get remote server
principal for "host/crick.my.domain"
04/07/22 03:46:03 (D_SECURITY) KERBEROS: krb5_unparse_name:
host/crick.my.domain@xxxxxxxxx
04/07/22 03:46:03 (D_SECURITY) KERBEROS: no user yet determined,
will grab up to slash
04/07/22 03:46:03 (D_SECURITY) KERBEROS: picked user: host
04/07/22 03:46:03 (D_SECURITY) KERBEROS: remapping 'host' to
'condor'
04/07/22 03:46:03 (D_SECURITY) unable to open map file (null),
errno 22
04/07/22 03:46:03 (D_SECURITY) Client is condor@xxxxxxxxx
04/07/22 03:46:03 (D_SECURITY) init_daemon: client principal is
'host/crick.my.domain@xxxxxxxxx'
04/07/22 03:46:03 (D_SECURITY) init_daemon: Using default keytab
FILE:/etc/krb5.keytab
04/07/22 03:46:03 (D_SECURITY) init_daemon: Trying to get tgt
credential for service host/crick.my.domain@xxxxxxxxx
04/07/22 03:46:03 (D_PRIV) PRIV_CONDOR --> PRIV_ROOT at
/var/lib/condor/execute/slot1/dir_93903/userdir/.tmpWTI97r/condor-9.7.0/src/condor_io/condor_auth_kerberos.cpp:632
04/07/22 03:46:03 (D_PRIV) PRIV_ROOT --> PRIV_CONDOR at
/var/lib/condor/execute/slot1/dir_93903/userdir/.tmpWTI97r/condor-9.7.0/src/condor_io/condor_auth_kerberos.cpp:634
04/07/22 03:46:03 (D_ALWAYS) AUTH_ERROR: Generic preauthentication
failure
04/07/22 03:46:03 (D_SECURITY) AUTHENTICATE: method 64 (KERBEROS)
failed.
04/07/22 03:46:03 (D_SECURITY) HANDSHAKE: in handshake(my_methods
= '')
04/07/22 03:46:03 (D_SECURITY) HANDSHAKE: handshake() - i am the
client
04/07/22 03:46:03 (D_SECURITY) HANDSHAKE: sending (methods == 0)
to server
04/07/22 03:46:03 (D_SECURITY) HANDSHAKE: server replied (method =
0)
04/07/22 03:46:03 (D_ALWAYS) SECMAN: required authentication with
local schedd failed, so aborting command QMGMT_WRITE_CMD.
04/07/22 03:46:03 (fd:6) (pid:287909) (D_ALWAYS) WARNING: failed
to connect to queue manager (AUTHENTICATE:1003:Failed to
authenticate with any method|AUTHENTICATE:1004:Failed to
authenticate using KERBEROS)
I'm not sure what the mechanics are supposed to be, but it looks
like the local machine (rather than the user) credentials are being
used to authenticate with the local schedd, and this somehow doesn't
work? Could it be that this code is running as non-root so the
krb5.keytab is inaccessible?