Hi Luke,
The developer before me appears to have added GSI support to the Puppet module.
I don't have any of those set in Puppet or on the machine.
Yes I meant the 10_security file. We have the CERTIFICATE_MAPFILE set which is generated in Puppet and converts the canonical host names to their function in the pool, the top line specifices the kerberos regex.
Authentication works between the rest of the nodes and worked on the CE pre puppet config. I was able to submit and run jobs, plus execute schedd actions.
Running a condor_status on the CE would still show the pool even with the bad config.
It seems to be something specific to the puppet module's schedd configuration as all the other nodes are configured via puppet as well (workers and cm).
Thanks,
Iain
From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of L Kreczko [L.Kreczko@xxxxxxxxxxxxx]
Sent: 25 March 2015 10:14
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] Configuring a CE/Schedd
LukeCheers,That all said, I learnt at the last HTCondor week a useful command to debug such authentication problems between condor nodes:When you say 'security config file' do you mean /etc/condor/conf.d/10_security.config or another file (trying to figure out if there are some leftovers from a manual configuration)?I also don't see how you can end up with GSI authentication using the puppet module, as it can't be currently configured (only FS, CLAIMTOBE, KERBEROS & PASSWORD are configurable).Can you confirm that you've setDear Iain,It seems that the authentication between your nodes is not working. Condor attempts FS, then KERBEROS, then GSI authentication and fails on each of them.
use_cert_map_file = true
use_kerberos_auth = true
use_password_auth = false
?
condor_ping -addr <condor node>:9618 -table WRITE READ
On 24 March 2015 at 18:48, Iain Bradford Steers <iain.steers@xxxxxxx> wrote:
Hi,
I'm in the process of finalizing our CE/Schedd setup for our pool, we're using Puppet.
I had the CE working and acting as a scheduler with a manual config and decided to move it to the HEP-Puppet/htcondor module.
This is the output I get in SchedLog(*), I've removed the ip but it's the machine's own ip in all instances.
After this it just proceeds to spam condor_write errors until it fills the log file and starts a new one.
The ce is in the certificate mapfile along with all the other hosts and apart from the ordering of hostnames a vimdiff shows no difference between the security config file for this and the one that the central manager uses.
Has anyone else experienced this issue?
Thanks, Iain
(*)03/24/15 19:12:28 Address rewriting: Warning: attribute 'ScheddIpAddr' <MACHINE_IP:9618?noUDP&sock=17305_aee5_3> == <MACHINE_IP:9618?noUDP&sock=17305_aee5_3>, but old logic couldn't find the command port for outbound interface MACHINE_IP.03/24/15 19:12:28 Address rewriting: Warning: attribute 'ScheddIpAddr' address in ad (<MACHINE_IP:9618?noUDP&sock=17305_aee5_3>) == command socket (<MACHINE_IP:9618?noUDP&sock=17305_aee5_3>), but old logic couldn't find that command socket in its list.03/24/15 19:12:28 Address rewriting: Warning: attribute 'MyAddress' <MACHINE_IP:9618?noUDP&sock=17305_aee5_3> == <MACHINE_IP:9618?noUDP&sock=17305_aee5_3>, but old logic couldn't find the command port for outbound interface MACHINE_IP.03/24/15 19:12:28 Address rewriting: Warning: attribute 'MyAddress' address in ad (<MACHINE_IP:9618?noUDP&sock=17305_aee5_3>) == command socket (<MACHINE_IP:9618?noUDP&sock=17305_aee5_3>), but old logic couldn't find that command socket in its list.03/24/15 19:12:33 -------- Begin starting jobs --------03/24/15 19:12:33 -------- Done starting jobs --------03/24/15 19:13:14 Received a superuser command03/24/15 19:13:14 This process has a valid certificate & key03/24/15 19:13:14 Failed to read end of message from <MACHINE_IP:34711>; 1280 untouched bytes.03/24/15 19:13:14 condor_write(): Socket closed when trying to write 13 bytes to <MACHINE_IP:34711>, fd is 15, errno=104 Connection reset by peer03/24/15 19:13:14 Buf::write(): condor_write() failed03/24/15 19:13:14 condor_read(): Socket closed when trying to read 5 bytes from <MACHINE_IP:34711> in non-blocking mode03/24/15 19:13:14 IO: EOF reading packet header03/24/15 19:13:14 condor_read(): Socket closed when trying to read 5 bytes from <MACHINE_IP:34711>03/24/15 19:13:14 IO: EOF reading packet header03/24/15 19:13:14 AUTHENTICATE: handshake failed!03/24/15 19:13:14 DC_AUTHENTICATE: required authentication of 128.142.132.67 failed: AUTHENTICATE:1002:Failure performing handshake|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1004:Unable to lstat(/tmp/FS_XXXWRRJqi)|AUTHENTICATE:1004:Failed to authenticate using FS|AUTHENTICATE:1004:Failed to authenticate using KERBEROS|AUTHENTICATE:1004:Failed to authenticate using GSI|GSI:5002:Failed to authenticate because the remote (client) side was not able to acquire its credentials.
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
*********************************************************
 Dr Lukasz Kreczko      Â+44 (0)117 928 8724 Â
 CMS Group
 School of Physics
 University of Bristol
*********************************************************
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/