Hi,
Unfortunately I've been having a problems getting HTCondor to work on Linux. I have a two node cluster using the shared file system configuration. My configuration is below.
condor_config.headnode
condor_config.submit
condor_config.cluster
## Condor configuration for OSG Clusters
## For more detial please see
## http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html
LOCAL_CONFIG_FILE = /mnt/share/condor-etc/condor_config.$(HOSTNAME)
# The following should be your cluster domain. This is an arbitrary string used by Condor, not necessarily matching your IP do
main
UID_DOMAIN = fortran.iconplc.com
# Human readable name for your Condor pool
COLLECTOR_NAME = "OSG Cluster Condor at $(UID_DOMAIN)"
# A shared file system (NFS), e.g. job dir, is assumed if the name is the same
FILESYSTEM_DOMAIN = $(UID_DOMAIN)
# Here you have to use your network domain, or any comma separated list of hostnames and IP addresses including all your
# condor hosts. * can be used as wildcard
ALLOW_WRITE = central-manager.iconcr.com, fortran-01.iconcr.com
<---- This doesn't work when trying to set a password for both nodes.
CONDOR_ADMIN = root@$(FULL_HOSTNAME)
# The following should be the full name of the head node (Condor central manager)
CONDOR_HOST = central-manager.iconcr.com
# Port range should be opened in the firewall (can be different on different machines)
# This 9000-9999 is coherent with the iptables configuration in the Firewall documentation
IN_HIGHPORT = 9999
IN_LOWPORT = 9000
# This is to enforce password authentication
SEC_DAEMON_AUTHENTICATION = required
SEC_DAEMON_AUTHENTICATION_METHODS = password
SEC_CLIENT_AUTHENTICATION_METHODS = password,fs,gsi
SEC_PASSWORD_FILE = /var/lib/condor/condor_credential
ALLOW_DAEMON = condor_pool@*
## Sets how often the condor_negotiator starts a negotiation cycle
## for negotiator and schedd).
# It is defined in seconds and defaults to 60 (1 minute), default is 300.
NEGOTIATOR_INTERVAL = 20
## Scheduling parameters for the startd
TRUST_UID_DOMAIN = TRUE
# start as available and do not suspend, preempt or kill
START = TRUE
SUSPEND = FALSE
PREEMPT = FALSE
KILL = FALSE
I've been getting this problem below. I take it my config is wrong somewhere?
$ condor_submit test.con
Submitting job(s)
ERROR: Failed to connect to local queue manager
CEDAR:6001:Failed to connect to <10.11.1:1:9201>
Thank you,
Kind regards,
Gerard Whelan