Did you also switch to a newer version of HTCondor?
I think these messages from the CollectorLog on the central manager show the problem
12/28/21 17:20:03 PERMISSION
DENIED to unauthenticated@unmapped from host 192.168.178.61 for command 2 (UPDATE_MASTER_AD), access level ADVERTISE_MASTER: reason: ADVERTISE_MASTER authorization policy denies all access
12/28/21 17:20:03 DC_AUTHENTICATE: Command not authorized, done! 12/28/21 17:20:13 PERMISSION DENIED to unauthenticated@unmapped from host 192.168.178.61 for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD: reason: ADVERTISE_STARTD authorization policy denies all access 12/28/21 17:20:13 DC_AUTHENTICATE: Command not authorized, done!
The configuration on the central
manager does not have any value for ALLOW_ADVERTISE_MASTER or ALLOW_ADVERTISE_STARTD.
If you were running HTCondor 8.8.* then the ALLOW_WRITE configuration value would
be used when those had no value, but during the 8.9 series,
we made HTCondor more secure by default, and part of that was that ALLOW_ADVERTISE_MASTER
and ALLOW_ADVERTISE_STARTD stopped inheriting
the value of ALLOW_WRITE.
You can add these lines to the configuration of your central manager to fix this
ALLOW_ADVERTISE_MASTER = $(ALLOW_WRITE)
ALLOW_ADVERTISE_STARTD = $(ALLOW_WRITE)
ALLOW_ADVERTISE_SCHEDD = $(ALLOW_WRITE)
HTCondor is trying to move away from authentication based on IP addresses since that sort of installation is vulnerable to misuse by
anyone who has the ability run programs from within your firewall. If you trust everyone who has access to your 192.168.178.* IP
address range, then making the change above is fine. But if you want a more secure HTCondor installation, you should upgrade
to HTCondor 9.0 or 9.5 and switch to IDTOKEN authentication.
-tj
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of justin0419@xxxxxxxxx <justin0419@xxxxxxxxx>
Sent: Tuesday, December 28, 2021 10:56 AM To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> Subject: [HTCondor-users] Can't See Worker Machines - condor_status is blank Hi All.
I recently changed my maser server from CentOS 7 to Oracle Linux 8. I followed the installation instructions from: https://research.cs.wisc.edu/htcondor/instructions/el/8/development/ Having set up the Condor master and adjusted the worker servers to suite the new master (ip address and name) I find I can't run Condor over the network. condor_status comes up blank. If I add STARTD to my master config file, I do get a list of slots in the master machine, but I don't want to run anything on the master machine. But at least it tells me I've got some small percentage of the installation correct. I did have this problem before, which you very kindly supplied an answer for. I went through all the great suggestions you guys gave me last time but this time they don't work, so I'm clearly doing something else wrong. This isn't a firewall problem. For now I've disabled firewalld and selinux on all machines. my /etc/condor/condor_config file is untouched from the installation. Below is some log files, my /etc/hosts and the config files from the master and one of the workers. If anyone could clue me in I'd be most greatful. -- Kind regards, Justin Fisher ---------------------------------------------------------------------------------------------------- $CondorVersion: 8.9.13 Mar 30 2021 BuildID: 535058 PackageID: 8.9.13-1 $ ps ax | grep condor 19369 ? Ss 0:00 /usr/sbin/condor_master -f 19419 ? S 0:00 condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 60 -C 973 19420 ? Ss 0:00 condor_shared_port -p 9618 19421 ? Ss 0:00 condor_collector 19422 ? Ss 0:00 condor_negotiator 19423 ? Ss 0:00 condor_schedd 21617 pts/0 S+ 0:00 grep --color=auto condor ---------------------------------------------------------------------------------------------------- tail -n10 CollectorLog 12/28/21 17:19:47 Query info: matched=0; skipped=0; query_time=0.000180; send_time=0.000103; type=MachinePrivate; requirements={true}; locate=0; limit=0; from=COLLECTOR; peer=<192.168.178.63:22405>; projection={}; filter_private_ads=0 12/28/21 17:19:47 (Sending 0 ads in response to query) 12/28/21 17:19:47 QueryWorker: forked new high priority worker with id 20004 ( max 4 active 2 pending 0 ) 12/28/21 17:19:47 Query info: matched=0; skipped=14; query_time=0.000182; send_time=0.000084; type=Any; requirements={(((MyType == "Submitter")) || ((MyType == "Machine")))}; locate=0; limit=0; from=COLLECTOR; peer=<192.168.178.63:5845>; projection={}; filter_private_ads=0 12/28/21 17:20:03 PERMISSION DENIED to unauthenticated@unmapped from host 192.168.178.61 for command 2 (UPDATE_MASTER_AD), access level ADVERTISE_MASTER: reason: ADVERTISE_MASTER authorization policy denies all access 12/28/21 17:20:03 DC_AUTHENTICATE: Command not authorized, done! 12/28/21 17:20:13 PERMISSION DENIED to unauthenticated@unmapped from host 192.168.178.61 for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD: reason: ADVERTISE_STARTD authorization policy denies all access 12/28/21 17:20:13 DC_AUTHENTICATE: Command not authorized, done! 12/28/21 17:20:13 PERMISSION DENIED to unauthenticated@unmapped from host 192.168.178.61 for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD: reason: ADVERTISE_STARTD authorization policy denies all access 12/28/21 17:20:13 DC_AUTHENTICATE: Command not authorized, done! ---------------------------------------------------------------------------------------------------- tail -n10 MasterLog 12/28/21 17:03:46 Started DaemonCore process "/usr/libexec/condor/condor_shared_port", pid and pgroup = 19420 12/28/21 17:03:46 Waiting for /var/lock/condor/shared_port_ad to appear. 12/28/21 17:03:46 Found /var/lock/condor/shared_port_ad. 12/28/21 17:03:46 Cannot remove wait-for-startup file /var/log/condor/.collector_address 12/28/21 17:03:47 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 19421 12/28/21 17:03:47 Waiting for /var/log/condor/.collector_address to appear. 12/28/21 17:03:47 Found /var/log/condor/.collector_address. 12/28/21 17:03:47 Started DaemonCore process "/usr/sbin/condor_negotiator", pid and pgroup = 19422 12/28/21 17:03:47 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 19423 12/28/21 17:03:47 Daemons::StartAllDaemons all daemons were started ---------------------------------------------------------------------------------------------------- tail -n10 SchedLog 12/28/21 17:03:47 (pid:19423) DaemonCore: command socket at <192.168.178.63:9618?addrs=192.168.178.63-9618+[2001-871-262-b1ea-20c-29ff-feff-a619]-9618&alias=or8.ingenazure.com&noUDP&sock=schedd_19369_19f7> 12/28/21 17:03:47 (pid:19423) DaemonCore: private command socket at <192.168.178.63:9618?addrs=192.168.178.63-9618+[2001-871-262-b1ea-20c-29ff-feff-a619]-9618&alias=or8.ingenazure.com&noUDP&sock=schedd_19369_19f7> 12/28/21 17:03:47 (pid:19423) History file rotation is enabled. 12/28/21 17:03:47 (pid:19423) Maximum history file size is: 20971520 bytes 12/28/21 17:03:47 (pid:19423) Number of rotated history files is: 2 12/28/21 17:03:47 (pid:19423) Reloading job factories 12/28/21 17:03:47 (pid:19423) Loaded 0 job factories, 0 were paused, 0 failed to load 12/28/21 17:03:47 (pid:19423) TransferQueueManager stats: active up=0/100 down=0/100; waiting up=0 down=0; wait time up=0s down=0s 12/28/21 17:03:47 (pid:19423) TransferQueueManager upload 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load 12/28/21 17:03:47 (pid:19423) TransferQueueManager download 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load [jfisher@or8 condor]$ ---------------------------------------------------------------------------------------------------- All /etc/hosts files are identical: more /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.178.63 or8.ingenazure.com 192.168.178.61 eda1.ingenazure.com 192.168.178.60 eda2.ingenazure.com Pinging from master machine to ensure no typo's on /etc/hosts: ping or8.ingenazure.com PING or8.ingenazure.com (192.168.178.63) 56(84) bytes of data. 64 bytes from or8.ingenazure.com (192.168.178.63): icmp_seq=1 ttl=64 time=0.018 ms ping eda1.ingenazure.com PING eda1.ingenazure.com (192.168.178.61) 56(84) bytes of data. 64 bytes from eda1.ingenazure.com (192.168.178.61): icmp_seq=1 ttl=64 time=0.848 ms ping eda2.ingenazure.com PING eda2.ingenazure.com (192.168.178.60) 56(84) bytes of data. 64 bytes from eda2.ingenazure.com (192.168.178.60): icmp_seq=1 ttl=64 time=0.848 ms ---------------------------------------------------------------------------------------------------- Master machine (or8.ingenazure.com) /etc/condor/config.d/00master.config DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, SHARED_PORT START = true ALLOW_ADMINISTRATOR = jfisher@xxxxxxxxxxxxxx DEFAULT_DOMAIN_NAME = ingenazure.com UID_DOMAIN = ingenazure.com FILESYSTEM_DOMAIN = $(UID_DOMAIN) ALLOW_WRITE = 192.168.178.* ALLOW_READ = */*.ingenazure.com, or8.ingenazure.com ALLOW_NEGOTIATOR = or8.ingenazure.com, 192.168.178.* CONDOR_ADMIN = jfisher@xxxxxxxxxxxxxx CONDOR_HOST = or8.ingenazure.com USE_NFS = FALSE HOSTNAME = or8 USE_SHARED_PORT=TRUE SHARED_PORT_ARGS = -p 9618 COLLECTOR_USES_SHARED_PORT=TRUE COLLECTOR_HOST = $(CONDOR_HOST):9618 StartJobs = TRUE MASTER_INSTANCE_LOCK = /var/lock/condor/InstanceLock MAX_DEFAULT_LOG = 1000000 EVENT_LOG = $(LOG)/EventLog EVENT_LOG_JOB_AD_INFORMATION_ATTRS=Owner,CurrentHosts,x509userproxysubject,x509UserProxyVOName,AccountingGroup,GlobalJo bId,QDate,JobStartDate,JobCurrentStartDate,JobFinishedHookDone EVENT_LOG_MAX_SIZE = 10000000 EVENT_LOG_MAX_ROTATIONS = 5 POOL_HISTORY_DIR = /var/log/condor KEEP_POOL_HISTORY = True GROUP_NAMES = group_ANALOG, group_DIGITAL, group_OTHER, #set the shares for your users GROUP_QUOTA_DYNAMIC_group_ANALOG = 1 GROUP_QUOTA_DYNAMIC_group_DIGITAL = 1 GROUP_QUOTA_DYNAMIC_group_OTHER = 0.5 GROUP_ACCEPT_SURPLUS = TRUE ---------------------------------------------------------------------------------------------------- Worker machine 1 (eda1.ingenazure.com) /etc/condor/config.d/00worker.config CAL_CONFIG_DIR = /etc/condor/config.d DAEMON_LIST = MASTER,STARTD DEFAULT_DOMAIN_NAME = ingenazure.com CONDOR_HOST = or8.ingenazure.com UID_DOMAIN = ingenazure.com FILESYSTEM_DOMAIN = $(UID_DOMAIN) ALLOW_WRITE = $(ALLOW_WRITE), $(CONDOR_HOST), 192.168.178.* ALLOW_READ = *.$(UID_DOMAIN), 192.168.178.* CONDOR_ADMIN = jfisher@xxxxxxxxxxxxxx USE_NFS = FALSE StartJobs = true STARTD_ATTRS = StartJobs, $(STARTD_ATTRS) START = true HOSTALLOW_CONFIG = $(CONDOR_HOST) ALLOW_CONFIG = $(CONDOR_HOST) ENABLE_RUNTIME_CONFIG = True RUNTIME_CONFIG_ADMIN = $(CONDOR_HOST) STARTD.SETTABLE_ATTRS_ADMINISTRATOR = StartJobs ENABLE_PERSISTENT_CONFIG = True PERSISTENT_CONFIG_DIR = /etc/condor/persistent USE_SHARED_PORT = TRUE SHARED_PORT_ARGS = -p 9618 COLLECTOR_USES_SHARED_PORT=TRUE COLLECTOR_HOST = $(CONDOR_HOST):9618 # Enable CGROUP control BASE_CGROUP = htcondor CGROUP_MEMORY_LIMIT_POLICY = soft # slots NUM_SLOTS = 1 NUM_SLOTS_TYPE_1 = 24 SLOT_TYPE_1 = cpus=1, ram=4%, swap=4%, disk=4% SLOT_TYPE_1_PARTITIONABLE = true COUNT_HYPERTHREAD_CPUS = true ---------------------------------------------------------------------------------------------------- |