Hi,
I've tried to setup condor between two nodes.
When I run "condor_status" I get:
Error: communication error
My /etc/condor/condor_config file:
MY_FULL_HOSTNAME = abxx.xxx (here I put my hostname
## ÂPathnames
RUN Â Â = $(LOCAL_DIR)/run/condor
LOG Â Â = $(LOCAL_DIR)/log/condor
LOCK Â Â= $(LOCAL_DIR)/lock/condor
SPOOL Â = $(LOCAL_DIR)/lib/condor/spool
EXECUTE = $(LOCAL_DIR)/lib/condor/execute
BIN Â Â = $(RELEASE_DIR)/bin
LIB = $(RELEASE_DIR)/lib64/condor
INCLUDE = $(RELEASE_DIR)/include/condor
SBIN Â Â= $(RELEASE_DIR)/sbin
LIBEXEC = $(RELEASE_DIR)/libexec/condor
SHARE Â = $(RELEASE_DIR)/share/condor
PROCD_ADDRESS = $(RUN)/procd_pipe
JAVA_CLASSPATH_DEFAULT = $(SHARE) $(SHARE)/scimark2lib.jar .
## ÂWhat machine is your central manager?
CONDOR_HOST = $(MY_FULL_HOSTNAME)
## ÂThis macro determines what daemons the condor_master will start and keep its
Âwatchful eyes on.
## ÂThe list is a comma or space separated list of subsystem names
NETWORK_INTERFACE = 10.0.x.x (here I put my ip address)
DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD
My /etc/condor/condor_config.local file:
CONDOR_ADMIN Â Â Â Â Â Â Â Â Â Â= prometheus.abxx.xxx
#FILESYSTEM_DOMAIN Â Â Â Â Â Â Â = 10.0.x.x
#CONDOR_ADMIN Â Â Â Â Â Â Â Â Â Â= prometheus@xxxxxxxx
FILESYSTEM_DOMAIN Â Â Â Â Â Â Â = abxx.xxx
UID_DOMAIN Â Â Â Â Â Â Â Â Â Â Â= abxx.xxx
# each slot gets a CPU
NUM_SLOTS Â Â Â Â Â Â Â Â Â Â Â = 1
NUM_SLOTS_TYPE_1 Â Â Â Â Â Â Â Â= 1
SLOT_TYPE_1 Â Â Â Â Â Â Â Â Â Â = cpus=100%
SLOT_TYPE_1_PARTITIONABLE Â Â Â = True
USE_NFS Â Â Â Â Â Â Â Â Â Â Â Â = True
DAGMAN_LOG_ON_NFS_IS_ERROR Â Â Â= FALSE
KEEP_POOL_HISTORY Â Â Â Â Â Â Â = True
POOL_HISTORY_DIR Â Â Â Â Â Â Â Â= /var/spool/condor
POOL_HISTORY_MAX_STORAGE Â Â Â Â= 100000000
POOL_HISTORY_SAMPLING_INTERVAL Â= 60
ALLOW_READ Â Â Â Â Â Â Â Â Â Â Â= abxx.xxx
ALLOW_WRITE Â Â Â Â Â Â Â Â Â Â = abxx.xxx
ALLOW_ADMINISTRATOR Â Â Â Â Â Â = $(CONDOR_HOST)
ALLOW_OWNER Â Â Â Â Â Â Â Â Â Â = abxx.xxx, $(ALLOW_ADMINISTRATOR)
HOSTALLOW_ADMINISTRATOR Â Â Â Â =
abuo.com
DAEMON_LIST Â Â Â Â Â Â Â Â Â Â = $(DAEMON_LIST)
#START Â Â Â Â Â Â Â Â Â Â Â Â Â= ($(START)) && target.AcctGroup =?= "group_pseu
do_operational_processing"
NEGOTIATOR_MATCHLIST_CACHING Â Â= FALSE
NEGOTIATOR_ALLOW_QUOTA_OVERSUBSCRIPTION = TRUE
PRIORITY_HALFLIFE Â Â Â Â Â Â Â = 1.79769e+308
Condor MasterLog:
02/03/17 15:56:46 restarting /usr/sbin/condor_collector in 10 seconds
02/03/17 15:56:46 attempt to connect to <
10.0.2.15:9618> failed: Connection refu
sed (connect errno = 111).
02/03/17 15:56:46 ERROR: SECMAN:2003:TCP connection to collector abxx.xxx failed
.
02/03/17 15:56:46 Failed to start non-blocking update to <
10.0.2.15:9618>.
02/03/17 15:56:56 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 65480
02/03/17 15:56:58 SECMAN: FAILED: Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/03/17 15:56:58 ERROR: SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/03/17 15:56:58 Failed to start non-blocking update to <
10.0.2.15:9618>.
02/03/17 15:57:11 WARNING: forward resolution of abxx.xxx doesn't match 10.0.0.3
0!
02/03/17 15:57:11 Got SIGTERM. Performing graceful shutdown.
02/03/17 15:57:18 SECMAN: FAILED: Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/03/17 15:57:18 ERROR: SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/03/17 15:57:18 Failed to send update to collector
abuo.com.
02/03/17 15:57:18 Sent SIGTERM to STARTD (pid 64673)
02/03/17 15:57:18 AllReaper unexpectedly called on pid 64673, status 0.
02/03/17 15:57:18 The STARTD (pid 64673) exited with status 0
02/03/17 15:57:19 All STARTDs are gone. Stopping other daemons Gracefully
02/03/17 15:57:19 Sent SIGTERM to COLLECTOR (pid 65480)
02/03/17 15:57:19 Sent SIGTERM to NEGOTIATOR (pid 64671)
02/03/17 15:57:19 Sent SIGTERM to SCHEDD (pid 64672)
02/03/17 15:57:19 AllReaper unexpectedly called on pid 65480, status 0.
02/03/17 15:57:19 The COLLECTOR (pid 65480) exited with status 0
02/03/17 15:57:19 AllReaper unexpectedly called on pid 64671, status 0.
02/03/17 15:57:19 The NEGOTIATOR (pid 64671) exited with status 0
02/03/17 15:57:19 AllReaper unexpectedly called on pid 64672, status 0.
02/03/17 15:57:19 The SCHEDD (pid 64672) exited with status 0
02/03/17 15:57:19 All daemons are gone. Exiting.
02/03/17 15:57:19 **** condor_master (condor_MASTER) pid 4179 EXITING WITH STATUS 0
My CollectorLog:
02/03/17 15:56:58 PERMISSION DENIED to unauthenticated@unmapped from host 10.0.2.15 for command 2 (UPDATE_MASTER_AD), access level ADVERTISE_MASTER: reason: ADVERTISE_MASTER authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15, hostname size = 0, original ip address = 10.0.2.15
02/03/17 15:56:58 DC_AUTHENTICATE: Command not authorized, done!
02/03/17 15:56:58 CollectorAd Â: Inserting ** "< My Pool - abxx.xxx@xxxxxxxx >"
02/03/17 15:56:58 stats: Inserting new hashent for 'Collector':'My Pool - abxx.xxx@xxxxxxxx':'10.0.x.x'
02/03/17 15:57:18 Failed to send update to collector abxx.xxx.
02/03/17 15:57:18 Unable to send UPDATE_COLLECTOR_AD to all configured collectors
02/03/17 15:57:18 WARNING: forward resolution of abxx.xxx doesn't match 10.0.2.15!
02/03/17 15:57:18 PERMISSION DENIED to unauthenticated@unmapped from host 10.0.2.15 for command 10 (QUERY_STARTD_PVT_ADS), access level NEGOTIATOR: reason: NEGOTIATOR authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15, hostname size = 0, original ip address
Â= 10.0.2.15
02/03/17 15:57:18 DC_AUTHENTICATE: Command not authorized, done!
02/03/17 15:57:18 PERMISSION DENIED to unauthenticated@unmapped from host 10.0.2.15 for command 15 (INVALIDATE_MASTER_ADS), access level ADVERTISE_MASTER: reason: cached result for ADVERTISE_MASTER; see first case for the full reason
02/03/17 15:57:18 DC_AUTHENTICATE: Command not authorized, done!
02/03/17 15:57:18 WARNING: forward resolution of abxx.xxx doesn't match 10.0.2.15!
02/03/17 15:57:18 PERMISSION DENIED to unauthenticated@unmapped from host 10.0.2.15 for command 13 (INVALIDATE_STARTD_ADS), access level ADVERTISE_STARTD: reason: ADVERTISE_STARTD authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15, hostname size = 0, original ip address = 10.0.2.15
02/03/17 15:57:18 DC_AUTHENTICATE: Command not authorized, done!
02/03/17 15:57:19 Got SIGTERM. Performing graceful shutdown.
02/03/17 15:57:19 **** condor_collector (condor_COLLECTOR) pid 65480 EXITING WITH STATUS 0