Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] condor_master does not start
- Date: Thu, 12 Jun 2014 13:17:22 -0500
- From: Cody Belcher <codytrey@xxxxxxxxxxxxxxxx>
- Subject: [HTCondor-users] condor_master does not start
I have a condor cluster running on a group of imacs and a mac pro server
acting as the central manager. A few weeks ago, my central manager
automattically updated to Mac OS 10.9 (someone before me must have
mis-configured it, I wouldn't intentionally have it auto up date).
However my cluster seemed to have continued working (after updating NFS
manager to a version compatible with my new OS version). Now, with no
other changes, a user reports to me that the cluster isn't working and
he gets the error:
codytrey@metis:~$ condor_status
Error: communication error
CEDAR:6001:Failed to connect to <128.194.151.191:9618>
Okay, easy enough, the master isn't running on the central manager. So I
run condor_master and it runs with no output as if it worked fine, but a
ps aux | grep condor shows that no condor daemons are running, checking
the master log I find this:
06/12/14 13:06:29 ******************************************************
06/12/14 13:06:29 ** condor_master (CONDOR_MASTER) STARTING UP
06/12/14 13:06:29 ** /condor/sbin/condor_master
06/12/14 13:06:29 ** SubsystemInfo: name=MASTER type=MASTER(2)
class=DAEMON(1)
06/12/14 13:06:29 ** Configuration: subsystem:MASTER local:<NONE>
class:DAEMON
06/12/14 13:06:29 ** $CondorVersion: 7.8.6 Oct 24 2012 BuildID: 73238 $
06/12/14 13:06:29 ** $CondorPlatform: x86_64_macos_10.7 $
06/12/14 13:06:29 ** PID = 34380
06/12/14 13:06:29 ** Log last touched 6/12 13:02:23
06/12/14 13:06:29 ******************************************************
06/12/14 13:06:29 Using config source: /etc/condor/condor_config
06/12/14 13:06:29 Using local config sources:
06/12/14 13:06:29 /condor/var/condor_config.local
06/12/14 13:06:29 Sock::bind failed: errno = 49 Can't assign requested
address
06/12/14 13:06:29 Failed to bind to command ReliSock
06/12/14 13:06:29 (Make sure your IP address is correct in /etc/hosts.)
06/12/14 13:06:29 ERROR "BindAnyCommandPort() failed" at line 9247 in
file
/usr/local/condor/local/execute/slot2/dir_26216/userdir/src/condor_daemon_core.V6/daemon_core.cpp
I checked that /etc/hosts is correct, and it is. Am I missing something,
or is it possible that my condor version is incompatible with the newer
version of OS X?
Thanks,
Cody
--
---------------------------------------------------------------------------
Cody Belcher email: codytrey@xxxxxxxx
Computer Support Group phone: (979) 845-1379
Department of Physics & Astronomy office: MPHY 155
---------------------------------------------------------------------------