Hi, I have been using condor for half a year, deployed on virtual machines. But l did few ‘yum update’ in recent weeks. For last few days I try to re-doeply condor master and it does not want to work. The error I see is: ——— [root@oswrk117 ~]# condor_q -- Failed to fetch ads from: <10.60.0.12:10574> : oswrk117.lns.mit.edu CEDAR:6001:Failed to connect to <10.60.0.12:10574> —— Perhaps you could advise me how can I fix my condor? Below are gory details of my current system Thanks Jan THE DETAILS: I’m now using this version of condor: [root@oswrk117 ~]# rpm -qa | grep -i condor condor-8.3.2-288596.x86_64 Those are the condor processes which run: [root@oswrk117 ~]# ps -ef |grep condor condor 3047 1 0 13:23 ? 00:00:00 /usr/sbin/condor_master -pidfile /var/run/condor/condor.pid root 3048 3047 0 13:23 ? 00:00:01 condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 60 -C 496 root 3431 2733 0 13:29 pts/0 00:00:00 grep condor Those are the processes I wanted to run on this VM: [root@oswrk117 ~]# condor_config_val -v DAEMON_LIST DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD # at: /etc/condor/condor_config.local, line 51 The OS on VM is : [root@oswrk117 ~]# uname -a Linux oswrk117.lns.mit.edu 2.6.32-504.3.3.el6.x86_64 #1 SMP Tue Dec 16 14:29:22 CST 2014 x86_64 x86_64 x86_64 GNU/Linux This VM has 2 network interfaces: a) local [root@oswrk117 ~]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr FA:16:3E:D6:F7:45 inet addr:10.60.0.12 Bcast:10.60.0.255 Mask:255.255.255.0 inet6 addr: fe80::f816:3eff:fed6:f745/64 Scope:Link b) public 198.125.163.117 The VM reports : [root@oswrk117 ~]# hostname -f I have set up Condor-master to use the local IP for worker comunication by setting this 2 variables: # below use IP of the this node TCP_FORWARDING_HOST = 198.125.163.117 PRIVATE_NETWORK_INTERFACE = 198.125.163.117 The fire-wall on VM is deactivated: [root@oswrk117 ~]# service iptables status iptables: Firewall is not running. Also, there is no port blocking on the OpenStack controller owning this VM: Below is full dump of my condor config file. —————— [root@oswrk117 ~]# cat /etc/condor/condor_config.local # modified by Jan Balewski, MIT CONDOR_HOST = $(FULL_HOSTNAME) COLLECTOR_NAME = "VM condor master on $(FULL_HOSTNAME)" ############################################################################### # Pool settings ############################################################################### # EC2 workers don't have shared filesystems or authentication UID_DOMAIN = lns.mit.edu TRUST_UID_DOMAIN = $(UID_DOMAIN) FILESYSTEM_DOMAIN = $(UID_DOMAIN) USE_NFS = False USE_AFS = False USE_CKPT_SERVER = False # The same for all machines with the same condor user CONDOR_IDS = 496.492 ############################################################################### # trick to force condor to use public IP ############################################################################### # to check what IP condor uses execute: # condor_status -format "%s, " Name -format "%s\n" MyAddress # to check what public IP VM uses execute: # see more details in this post: # below use IP of the this node TCP_FORWARDING_HOST = 198.125.163.117 PRIVATE_NETWORK_INTERFACE = 198.125.163.117 ############################################################################### # Security settings ############################################################################### # Allow local host and the central manager to manage the node ALLOW_ADMINISTRATOR = $(FULL_HOSTNAME), $(CONDOR_HOST) # master needs this two particular versions ALLOW_READ = *.lns.mit.edu,10.60.0.* ALLOW_WRITE = *.lns.mit.edu,10.60.0.* ############################################################################### # CPU usage settings ############################################################################### # Don't count a hyperthreaded CPU as multiple CPUs COUNT_HYPERTHREAD_CPUS = False # Leave this commented out. If your instance has more than one CPU (i.e. if # you use a large instance or something) then condor will advertise one # slot for each CPU. # for master reduce # of jobs to N-1 NUM_CPUS = 4 ############################################################################### # Daemon settings ############################################################################### # Full list on the host node DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD # Don't run java JAVA = ############################################################################### # Classads ############################################################################### # Run everything, all the time START = True SUSPEND = False CONTINUE = True PREEMPT = False WANT_VACATE = False WANT_SUSPEND = True SUSPEND_VANILLA = False WANT_SUSPEND_VANILLA = True KILL = False STARTD_EXPRS = START ############################################################################### # Network settings ############################################################################### # Use random numbers here so the workers don't all hit the collector at # the same time. If there are many workers the collector can get overwhelmed. UPDATE_INTERVAL = $RANDOM_INTEGER(230, 370) MASTER_UPDATE_INTERVAL = $RANDOM_INTEGER(230, 370) # Port range for Jan's VM-condor cluster at LNS LOWPORT=9600 HIGHPORT=10600 [root@oswrk117 ~]# |