I'll post detailed errors on request, but the short version is our condor master is a dual-homed host. There were no problems with this in condor 7.7; I even have another condor pool running 8.3.8 that has no problems either.
But with 8.8 the host-based security appears to get confused about which interface to use, no matter if I set CONDOR_HOST to either of the master's two interfaces; it also gets confused because a reverse DNS lookup won't give consistent results for our condor master.
I want to turn HTCondor's security completely and utterly off. It's not necessary for our small site.
However, as I noted before and Greg confirmed through his suggestion, the configuration line
SEC_DEFAULT_AUTHENTICATION = NEVERdoesn't turn security completely off. There's another configuration line in the default condor_config:
use SECURITY : HOST_BASEDI've done web searches on the HTCondor documentation, but I can't find any alternatives to "HOST_BASED" documented anywhere. Commenting out the line doesn't change anything.
How do I completely turn off security? On 5/23/19 9:13 PM, Hitchen, Greg (IM&T, Kensington WA) wrote:
Hi William We run a Windows pool, well mainly windows execute nodes (some linux) and only windows submit nodes. Our Central managers are all linux. Going from 8.4 to 8.6 things looked OK until we tried to submit jobs. Similar authentication errors. We needed the following: SEC_DEFAULT_AUTHENTICATION = REQUIRED SEC_DEFAULT_NEGOTIATION = OPTIONAL SEC_DEFAULT_ENCRYPTION = OPTIONAL SEC_DEFAULT_AUTHENTICATION_METHODS = CLAIMTOBE SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = True on ALL nodes, i.e. CM, execute and submit nodes. If not on all nodes then the CM will NOT be able to communicate. Not sure if it will fix your problem but maybe worth a try. Cheers Greg -----Original Message----- From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of William Seligman Sent: Friday, 24 May 2019 4:23 AM To: htcondor-users@xxxxxxxxxxx Subject: [HTCondor-users] Going from Condor 7.7 to HTCondor 8.8 Background: I'm the sysadmin of a small CentOS 6 computing farm. For years our small condor pool was running Condor 7.7; higher versions offered no new features we needed. Then the user required a new (unrelated) software installation for which the old CentOS 5 condor 7.7 libraries were incompatible and they requested I upgrade to HTCondor 8.8. From that point until now, I have not been able to get HTCondor 8.8 to fully run on the farm. My debugging steps included erasing the condor_config* files and replacing them with those from the RPMs and completely wiping the contents of LOCAL_DIR. Where I'm at now: Although the condor services start up properly, I can't submit any jobs. The error is: # condor_submit myfile.cmd Submitting job(s) ERROR: Failed to connect to local queue manager SECMAN:2007:Failed to end classad message. The results of web searches on this error have not helped. For the record: - I've followed the instructions at <https://lists.cs.wisc.edu/archive/htcondor-users/2008-March/msg00178.shtml> multiple times. Since I had started with a fresh LOCAL_DIR, the file LOCAL_DIR/spool/job_queue.log had no invalid entries, but I gave it a try anyway. - At present, the users are not submitting any condor jobs, so schedd is not busy. - Schedd is running: # ps -elf | grep schedd 4 S condor 60019 59973 0 80 0 - 13065 poll_s May22 ? 00:00:07 condor_schedd -f - The firewall is off. Neither iptables nor netfilter is running. (Our site has Cisco firewall that I've configured to block off port 9618 from the outside, so I'm concerned.) - nmap tells me that port 9618 on the CONDOR_HOST is open. - The only error in SchedLog is DC_AUTHENTICATE: Unable to reconcile! - I turned on debugging in condor_config.local: TOOL_DEBUG = D_ALL SUBMIT_DEBUG = D_ALL and ran the job with # condor_submit -debug myfile.cmd I can post the results on request. I'm no expert, but the relevant lines appear to be: 05/23/19 15:57:02 (fd:5) (pid:863797) (D_SECURITY) SECMAN: command 1112 QMGMT_WTE_CMD to schedd at <129.236.252.84:9618> from TCP port 19038 (blocking). 05/23/19 15:57:02 (fd:5) (pid:863797) (D_SECURITY) SECMAN:: default CLIENT meths: FS,KERBEROS,GSI,CLAIMTOBE 05/23/19 15:57:02 (fd:5) (pid:863797) (D_NETWORK) condor_write(fd=4 schedd at <9.236.252.84:9618>,,size=416,timeout=0,flags=0,non_blocking=0) 05/23/19 15:57:02 (fd:5) (pid:863797) (D_NETWORK) condor_read(fd=4 schedd at <1.236.252.84:9618>,,size=5,timeout=0,flags=0,non_blocking=0) 05/23/19 15:57:02 (fd:5) (pid:863797) (D_NETWORK) Stream::get(int) failed to re padding 05/23/19 15:57:02 (fd:5) (pid:863797) (D_ALWAYS) SECMAN: no classad from serverfailing - The only non-default lines in the condor_config file are: BIND_ALL_INTERFACES = TRUE SEC_DEFAULT_AUTHENTICATION = NEVER Is there anything else I can do? Thanks!
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature