Dear HTCondor experts, trying to read through "kinit" itself: https://github.com/krb5/krb5/blob/master/src/clients/kinit/kinit.c I do mainly see two major differences to HTCondor code: - They use "krb5_cc_resolve" always first to check if the a credential cache exists, and if it does, they use: "krb5_get_init_creds_opt_set_in_ccache" to have the init command use it. - They are using "krb5_get_init_creds_opt_set_out_ccache" to enable storage of the fetched credential in the cache. Maybe that is already sufficient? I am unsure about the parameters and implications, but maybe the HTCondor authentication expert can use this information. For reference, the manual steps would be: ------------------------------------------ $ kinit -k host/condor-cm1.domain/REALM $ kvno host/schedd1.domain@REALM $ kvno host/condor-cm1.domain@REALM ------------------------------------------ to get the wanted behaviour (i.e. credentials go to cache by default with kinit and kvno): ------------------------------------------ $ klist -Af Ticket cache: KEYRING:persistent:0:0 Default principal: host/condor-cm1.domain@REALM Valid starting Expires Service principal 05/23/19 02:54:29 05/24/19 02:52:58 host/condor-cm1.domain@REALM renew until 05/30/19 02:52:58, Flags: FRT 05/23/19 02:53:01 05/24/19 02:52:58 host/schedd1.domain@REALM renew until 05/30/19 02:52:58, Flags: FRT 05/23/19 02:52:58 05/24/19 02:52:58 krbtgt/REALM@REALM renew until 05/30/19 02:52:58, Flags: FRI ------------------------------------------ Hope this helps! Cheers, Oliver Am 23.05.19 um 02:06 schrieb Oliver Freyermuth: > Dear HTCondor experts, > > we've observed hefty AS-REQs (Kerberos Authentication Service Requests) with rates up to several hundred requests per second > when a lot of jobs are started and daemons (using Kerberos auth) need to talk to each other, issued by the central manager node (running negotiator and collector). > > I can also reproduce that more easily by running "condor_q -all -global" as "root" user who does not have Kerberos credentials on our condor-cm (central manager), > but can access the host principal (and hence use the service credentials to authenticate). A snippet from the debug logs running condor_q confirms my observation: > > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) KERBEROS: Server principal is host/schedd1.domain@REALM > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) init_daemon: client principal is 'host/condor-cm1.domain@REALM' > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) init_daemon: Using default keytab FILE:/etc/krb5.keytab > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) init_daemon: Trying to get tgt credential for service host/schedd1@REALM > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_PRIV) PRIV_UNKNOWN --> PRIV_ROOT at /slots/10/dir_2560730/userdir/.tmpV7H12D/BUILD/condor-8.8.2/src/condor_io/condor_auth_kerberos.cpp:632 > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_PRIV) PRIV_ROOT --> PRIV_UNKNOWN at /slots/10/dir_2560730/userdir/.tmpV7H12D/BUILD/condor-8.8.2/src/condor_io/condor_auth_kerberos.cpp:634 > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) init_daemon: gic_kt creds_->client is 'host/condor-cm1.domain@REALM' > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) init_daemon: gic_kt creds_->server is 'host/schedd1.domain@REALM' > 05/23/19 01:48:15 (fd:4) (pid:2411) (D_SECURITY) Success.......................... > > It seems that in daemon authentication, a fresh credential is fetched for each single daemon-to-daemon interaction. We realized that since the KDC of our computing centre got DOSed by that > and the service failed (twice up to now). > Fetching a credential means, in "Kerberos speak" issuing an AS-REQ and having the KDC generate an AS-REP. This is computationally pretty expensive on the KDC end. > > Our computing centre is trying to improve the situation on their end to stand this hefty load better, but still it's best practice in Kerberos to cache AS-REPs. > > Could caching be added? > Sadly, I do not have a straightforward suggestion what the implementation is missing to get that - for user credentials, the Kerberos library takes care of that automatically > (by using credential caches in files or the persistent kernel keyring), but that does not seem to happen for host / service credentials with HTCondor. Maybe HTCondor purges them after usage? > But I did not find that explicitly in the code. > However, issuing: > kinit -k host/condor-cm1.domain@REALM > successfully adds a TGT to the credential cache (in our case, the persistent kernel keyring), as I would expect it. But that does not happen with HTCondor. > > Cheers, > Oliver > -- Oliver Freyermuth UniversitÃt Bonn Physikalisches Institut, Raum 1.047 NuÃallee 12 53115 Bonn -- Tel.: +49 228 73 2367 Fax: +49 228 73 7869 --
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature