[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] FYI: minicondor install problem (ERROR: Failed to connect to local queue manager)



This may be me. I use a personal condor install on my Mac fM4or initial testing.

I do not have a good procedure for updating htcondor so I do the whole install process. I just log it here in case someone else has run into the same problem.

The error I saw was:

% condor_submit sleep.submit
Submitting job(s)
ERROR: Failed to connect to local queue manager
SECMAN:2011:Connection closed during command authorization. Probably due to an unknown command.

The problem was found in /usr/local/condor/local/log/SchedLog

11/05/25 17:52:43 (pid:2387) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
11/05/25 17:52:43 (pid:2387) ** /usr/local/condor/sbin/condor_schedd
11/05/25 17:52:43 (pid:2387) ** SubsystemInfo: name=SCHEDD type=SCHEDD(4) class=DAEMON(1)
11/05/25 17:52:43 (pid:2387) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON
11/05/25 17:52:43 (pid:2387) ** $CondorVersion: 25.3.1 2025-10-31 BuildID: 847300 GitSHA: eba0cca7 $
11/05/25 17:52:43 (pid:2387) ** $CondorPlatform: x86_64_macOS13 $
11/05/25 17:52:43 (pid:2387) ** PID = 2387 RealUID = 0
11/05/25 17:52:43 (pid:2387) ** Log last touched 11/5 17:48:18
11/05/25 17:52:43 (pid:2387) ******************************************************
11/05/25 17:52:43 (pid:2387) Using config source: /usr/local/condor/etc/condor_config
11/05/25 17:52:43 (pid:2387) Using local config sources:
11/05/25 17:52:43 (pid:2387)Â Â /usr/local/condor/local/config.d/00-minicondor
11/05/25 17:52:43 (pid:2387)Â Â /usr/local/condor/local/config.d/00-security
11/05/25 17:52:43 (pid:2387) config Macros = 60, Sorted = 60, StringBytes = 1504, TablesBytes = 2216
11/05/25 17:52:43 (pid:2387) CLASSAD_CACHING is ENABLED
11/05/25 17:52:43 (pid:2387) Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS
11/05/25 17:52:43 (pid:2387) SharedPortEndpoint: waiting for connections to named socket schedd_624_972f
11/05/25 17:52:43 (pid:2387) DaemonCore: command socket at <10.0.1.15:9618?addrs=10.0.1.15-9618+[fd35-d18d-b01d-8782-14ca-d54c->
11/05/25 17:52:43 (pid:2387) DaemonCore: private command socket at <10.0.1.15:9618?addrs=10.0.1.15-9618+[fd35-d18d-b01d-8782-14>
11/05/25 17:52:43 (pid:2387) Daemon history file: /usr/local/condor/local/spool/schedd_daemon_history
11/05/25 17:52:43 (pid:2387) History file rotation is enabled.
11/05/25 17:52:43 (pid:2387)Â ÂMaximum history file size is: 20971520 bytes
11/05/25 17:52:43 (pid:2387)Â ÂNumber of rotated history files is: 2
11/05/25 17:52:43 (pid:2387) config super users : root, condor
11/05/25 17:52:43 (pid:2387) failed to open log /usr/local/condor/local/spool/job_queue.log, errno = 13
11/05/25 17:52:43 (pid:2387) ERROR "Failed to initialize job queue log!" at line 2161 in file /usr/local/condor/local/execute/d>
11/05/25 18:01:25 (pid:3424) Setting maximum file descriptors to 20000.
11/05/25 18:01:25 (pid:3424) ******************************************************

It seems to have been fixed by changing the owner of /usr/local/condor/local/spool/ to condor:condor and using launchctl to restart condor

sudo chown -RÂ condor:condor /usr/local/condor/local/spool/
sudo launchctl stop condor
sudo launchctl start condor

This gets further I can submit the job but my one partionable slot reports 0 memory:

%Â condor_status -long |\grep -i memory
ChildMemory = {Â }
DetectedMemory = 32768
MachineResources = "Cpus Memory Disk Swap GPUs"
Memory = 0
TotalMemory = 32768
TotalSlotMemory = 32768
TotalVirtualMemory = 1128960
VirtualMemory = 0

and better analyze says no love for my job:

Job 24.000 defines the following attributes:

  FileSystemDomain = "mija.local"
  RequestDisk = 1048576 (kb)
  RequestMemory = 400 (mb)

slot1@xxxxxxxxxx has the following attributes:

  TARGET.Arch = "X86_64"
  TARGET.Disk = 323252412 (kb)
  TARGET.FileSystemDomain = "mija.local"
  TARGET.HasFileTransfer = true
  TARGET.Memory = 0 (mb)
  TARGET.OpSys = "macOS"

The Requirements _expression_ for job 24.000 reduces to these conditions:

    Slots
Step ÂMatched Condition
----- --------- ---------
[0]Â Â Â Â Â Â1Â TARGET.Arch == "X86_64"
[1]Â Â Â Â Â Â1Â TARGET.OpSys == "macOS"
[3]Â Â Â Â Â Â1Â TARGET.Disk >= RequestDisk
[5]Â Â Â Â Â Â0Â TARGET.Memory >= RequestMemory
[7]Â Â Â Â Â Â1Â TARGET.FileSystemDomain == MY.FileSystemDomain

How can I fix this? Any help is appreciated

Best,

Joe