Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Going from Condor 7.7 to HTCondor 8.8
- Date: Fri, 24 May 2019 16:02:24 +0000
- From: Zach Miller <zmiller@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Going from Condor 7.7 to HTCondor 8.8
Hi again William,
I should also mention that since you are making such a big leap in version, the quickest way to just get things working would be to completely blow away the old HTCondor stuff and just install the latest RPM. I know you said you moved the configuration over but so much has changed over the years, including various directory layouts and permissions, it will almost certainly be faster to start with a fresh install rather than debug an upgrade over the existing install.
Once you have done that and have it working, then let's discuss why you want to turn off authentication and if you have a compelling reason we can certainly do that. But I'd suggest starting from a working default installation first so we know there's no vestiges from the 7.7 release still around.
Let me know please if I can help out with any of that.
Cheers,
-zach
ïOn 5/23/19, 3:23 PM, "HTCondor-users on behalf of William Seligman" <htcondor-users-bounces@xxxxxxxxxxx on behalf of seligman@xxxxxxxxxxxxxxxxxx> wrote:
Background: I'm the sysadmin of a small CentOS 6 computing farm. For years our
small condor pool was running Condor 7.7; higher versions offered no new
features we needed. Then the user required a new (unrelated) software
installation for which the old CentOS 5 condor 7.7 libraries were incompatible
and they requested I upgrade to HTCondor 8.8.
From that point until now, I have not been able to get HTCondor 8.8 to fully
run on the farm. My debugging steps included erasing the condor_config* files
and replacing them with those from the RPMs and completely wiping the contents
of LOCAL_DIR.
Where I'm at now: Although the condor services start up properly, I can't submit
any jobs. The error is:
# condor_submit myfile.cmd
Submitting job(s)
ERROR: Failed to connect to local queue manager
SECMAN:2007:Failed to end classad message.
The results of web searches on this error have not helped. For the record:
- I've followed the instructions at
<https://lists.cs.wisc.edu/archive/htcondor-users/2008-March/msg00178.shtml>
multiple times. Since I had started with a fresh LOCAL_DIR, the file
LOCAL_DIR/spool/job_queue.log had no invalid entries, but I gave it a try anyway.
- At present, the users are not submitting any condor jobs, so schedd is not busy.
- Schedd is running:
# ps -elf | grep schedd
4 S condor 60019 59973 0 80 0 - 13065 poll_s May22 ? 00:00:07
condor_schedd -f
- The firewall is off. Neither iptables nor netfilter is running. (Our site has
Cisco firewall that I've configured to block off port 9618 from the outside, so
I'm concerned.)
- nmap tells me that port 9618 on the CONDOR_HOST is open.
- The only error in SchedLog is
DC_AUTHENTICATE: Unable to reconcile!
- I turned on debugging in condor_config.local:
TOOL_DEBUG = D_ALL
SUBMIT_DEBUG = D_ALL
and ran the job with
# condor_submit -debug myfile.cmd
I can post the results on request. I'm no expert, but the relevant lines appear
to be:
05/23/19 15:57:02 (fd:5) (pid:863797) (D_SECURITY) SECMAN: command 1112
QMGMT_WTE_CMD to schedd at <129.236.252.84:9618> from TCP port 19038 (blocking).
05/23/19 15:57:02 (fd:5) (pid:863797) (D_SECURITY) SECMAN:: default CLIENT
meths: FS,KERBEROS,GSI,CLAIMTOBE
05/23/19 15:57:02 (fd:5) (pid:863797) (D_NETWORK) condor_write(fd=4 schedd at
<9.236.252.84:9618>,,size=416,timeout=0,flags=0,non_blocking=0)
05/23/19 15:57:02 (fd:5) (pid:863797) (D_NETWORK) condor_read(fd=4 schedd at
<1.236.252.84:9618>,,size=5,timeout=0,flags=0,non_blocking=0)
05/23/19 15:57:02 (fd:5) (pid:863797) (D_NETWORK) Stream::get(int) failed to re
padding
05/23/19 15:57:02 (fd:5) (pid:863797) (D_ALWAYS) SECMAN: no classad from
serverfailing
- The only non-default lines in the condor_config file are:
BIND_ALL_INTERFACES = TRUE
SEC_DEFAULT_AUTHENTICATION = NEVER
Is there anything else I can do?
Thanks!