[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] FYI: minicondor install problem (ERROR: Failed to connect to local queue manager)



I agree with your assessment. The installation instructions consider only a new installation and rely on the condor_master to create some of the necessary directories (e.g. spool and execute) as the condor user at startup. The condor_master will create some missing directories but it wonât change the owner if those directories already exist. So if you perform the chown commands in instructions when doing an upgrade, the ownership of these directories end as root instead of condor, which will result in failures.

Weâll fix the instructions to avoid this problem.

 - Jaime

On Nov 7, 2025, at 8:03âAM, Joseph Areeda <newsreply@xxxxxxxxxx> wrote:

There was another ownership issue that seems to have been causing this. Changing the ownership of /usr/local/condor/local/execute to condor:condor fixed it.

So bottom line seems to be I did not upgrade HTCondor minicondor on MacOS properly. 

Do we have a documented procedure for this? 

My current theory is that files that are created by HTCondor  are not considered  by the install instructions are having their ownership changed by the instructions.

Joe

On Nov 5, 2025, at 18:54, Joseph Areeda <joe@xxxxxxxxxx> wrote:

This may be me. I use a personal condor install on my Mac for initial testing.

I do not have a good procedure for updating htcondor so I do the whole install process. I just log it here in case someone else has run into the same problem.

The error I saw was:

% condor_submit sleep.submit
Submitting job(s)
ERROR: Failed to connect to local queue manager
SECMAN:2011:Connection closed during command authorization. Probably due to an unknown command.

The problem was found in /usr/local/condor/local/log/SchedLog

11/05/25 17:52:43 (pid:2387) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
11/05/25 17:52:43 (pid:2387) ** /usr/local/condor/sbin/condor_schedd
11/05/25 17:52:43 (pid:2387) ** SubsystemInfo: name=SCHEDD type=SCHEDD(4) class=DAEMON(1)
11/05/25 17:52:43 (pid:2387) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON
11/05/25 17:52:43 (pid:2387) ** $CondorVersion: 25.3.1 2025-10-31 BuildID: 847300 GitSHA: eba0cca7 $
11/05/25 17:52:43 (pid:2387) ** $CondorPlatform: x86_64_macOS13 $
11/05/25 17:52:43 (pid:2387) ** PID = 2387 RealUID = 0
11/05/25 17:52:43 (pid:2387) ** Log last touched 11/5 17:48:18
11/05/25 17:52:43 (pid:2387) ******************************************************
11/05/25 17:52:43 (pid:2387) Using config source: /usr/local/condor/etc/condor_config
11/05/25 17:52:43 (pid:2387) Using local config sources:
11/05/25 17:52:43 (pid:2387)    /usr/local/condor/local/config.d/00-minicondor
11/05/25 17:52:43 (pid:2387)    /usr/local/condor/local/config.d/00-security
11/05/25 17:52:43 (pid:2387) config Macros = 60, Sorted = 60, StringBytes = 1504, TablesBytes = 2216
11/05/25 17:52:43 (pid:2387) CLASSAD_CACHING is ENABLED
11/05/25 17:52:43 (pid:2387) Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS
11/05/25 17:52:43 (pid:2387) SharedPortEndpoint: waiting for connections to named socket schedd_624_972f
11/05/25 17:52:43 (pid:2387) DaemonCore: command socket at <10.0.1.15:9618?addrs=10.0.1.15-9618+[fd35-d18d-b01d-8782-14ca-d54c->
11/05/25 17:52:43 (pid:2387) DaemonCore: private command socket at <10.0.1.15:9618?addrs=10.0.1.15-9618+[fd35-d18d-b01d-8782-14>
11/05/25 17:52:43 (pid:2387) Daemon history file: /usr/local/condor/local/spool/schedd_daemon_history
11/05/25 17:52:43 (pid:2387) History file rotation is enabled.
11/05/25 17:52:43 (pid:2387)   Maximum history file size is: 20971520 bytes
11/05/25 17:52:43 (pid:2387)   Number of rotated history files is: 2
11/05/25 17:52:43 (pid:2387) config super users : root, condor
11/05/25 17:52:43 (pid:2387) failed to open log /usr/local/condor/local/spool/job_queue.log, errno = 13
11/05/25 17:52:43 (pid:2387) ERROR "Failed to initialize job queue log!" at line 2161 in file /usr/local/condor/local/execute/d>
11/05/25 18:01:25 (pid:3424) Setting maximum file descriptors to 20000.
11/05/25 18:01:25 (pid:3424) ******************************************************

It seems to have been fixed by changing the owner of  /usr/local/condor/local/spool/ to condor:condor and using launchctl to restart condor

sudo chown -R  condor:condor /usr/local/condor/local/spool/
sudo launchctl stop condor
sudo launchctl start condor

Best,

Joe


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/