Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Condor-users] Fedora 3 collector problem
- Date: Wed, 1 Jun 2005 11:05:53 +0800
- From: <Greg.Hitchen@xxxxxxxx>
- Subject: RE: [Condor-users] Fedora 3 collector problem
What about the firewall?
FC3 enables iptables by default. Are you allowing tcp and udp through
in the appropriate port range?
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Joshua Juen
> Sent: Tuesday, 31 May 2005 11:08 PM
> To: Jose D. Zamora
> Cc: Condor-Users Mail List
> Subject: Re: [Condor-users] Fedora 3 collector problem
>
>
> Those files checked out ok. Still not sure what is happening.
>
> Also if I try to run a condor_status from fedora (my master)
> it says that it cannot connect to the collectore, but
> condor_status does work from my clients
>
> Thanks
> Josh
>
> On 5/31/05, Jose D. Zamora <jzamora@xxxxxxxxxxxx> wrote:
> > Check file :
> > /etc/condor/condor_config
> > for :
> > COLLECTOR_HOST = $(CONDOR_HOST)
> > DAEMON_LIST = MASTER, STARTD, SCHEDD, COLLECTOR,
> > NEGOTIATOR
> > and
> > Check file: /opt/condor-6.6.9/local.phy-condor/condor_config.local
> > for :
> > COLLECTOR_NAME = Collector at <hostname of your master here>
> >
> > Hope this helps
> >
> > On Tue, 31 May 2005 09:21:18 -0500, Joshua Juen <jj9867@xxxxxxxxx>
> > wrote:
> >
> > > I have set up condor as master on a Fedora 3 system. The
> > > installation seems to be working except that the master
> cannot find
> > > the collector.
> > >
> > > The condor_status works from the client machines but none of the
> > > machines can submit jobs. The submitting machine's jobs will just
> > > sit in the queue.
> > >
> > > Error sending update to the collector : Failed to connect to
> > > collector appears in the master log, the negotiator log and the
> > > start log.
> > >
> > > The port that the collector should be on is open and I can telnet
> > > into it. (I am assuming that the clients can also) but the master
> > > can't seem to find it.
> > >
> > > I think that the problem is probably a simple configuration error
> > > but I can not seem to track it down.
> > >
> > > Any help would be greatly appreciated,
> > > Thanks
> > > Josh
> > >
> > >
> > > MasterLog
> > >
> > > 5/31 08:24:19
> ******************************************************
> > > 5/31 08:24:19 ** condor_master (CONDOR_MASTER) STARTING UP 5/31
> > > 08:24:19 ** /opt/condor-6.6.9/sbin/condor_master
> > > 5/31 08:24:19 ** $CondorVersion: 6.6.9 Mar 10 2005 $
> > > 5/31 08:24:19 ** $CondorPlatform: I386-LINUX_RH9 $
> > > 5/31 08:24:19 ** PID = 2354
> > > 5/31 08:24:19
> ******************************************************
> > > 5/31 08:24:19 Using config file: /etc/condor/condor_config 5/31
> > > 08:24:19 Using local config files:
> > > /opt/condor-6.6.9/local.phy-condor/condor_config.local
> > > 5/31 08:24:19 Attempting to lock
> > > /tmp/condor-lock.phy-condor0.606384916537539/InstanceLock.
> > > 5/31 08:24:19 Obtained lock on
> > > /tmp/condor-lock.phy-condor0.606384916537539/InstanceLock.
> > > 5/31 08:24:19 DaemonCore: Command Socket at
> <xxx.xxx.xxx.50:32769>
> > > 5/31 08:24:19 SEC_DEFAULT_SESSION_DURATION is undefined, using
> > > default value of 3600 5/31 08:24:19 MASTER_TIMEOUT_MULTIPLIER is
> > > undefined, using default value of 0
> > > 5/31 08:24:19 MASTER_TIMEOUT_MULTIPLIER is undefined,
> using default
> > > value of 0
> > > 5/31 08:24:19 Will use UDP to update collector
> > > 5/31 08:24:19 Started DaemonCore process
> > > "/opt/condor-6.6.9/sbin/condor_collector", pid and pgroup = 2355
> > > 5/31 08:24:19 MASTER_TIMEOUT_MULTIPLIER is undefined,
> using default
> > > value of 0
> > > 5/31 08:24:19 Started DaemonCore process
> > > "/opt/condor-6.6.9/sbin/condor_negotiator", pid and pgroup = 2356
> > > 5/31 08:24:19 Started DaemonCore process
> > > "/opt/condor-6.6.9/sbin/condor_startd", pid and pgroup = 2357
> > > 5/31 08:24:19 Started DaemonCore process
> > > "/opt/condor-6.6.9/sbin/condor_schedd", pid and pgroup = 2358
> > > 5/31 08:24:21 DaemonCore: Command received via UDP from host
> > > <xxx.xxx.xxx.50:32773>
> > > 5/31 08:24:21 DaemonCore: received command 60008 (DC_CHILDALIVE),
> > > calling handler (HandleChildAliveCommand)
> > > 5/31 08:24:21 DaemonCore: Command received via UDP from host
> > > <xxx.xxx.xxx.50:32773>
> > > 5/31 08:24:21 DaemonCore: received command 60008 (DC_CHILDALIVE),
> > > calling handler (HandleChildAliveCommand)
> > > 5/31 08:24:22 DaemonCore: Command received via UDP from host
> > > <xxx.xxx.xxx.50:32773>
> > > 5/31 08:24:22 DaemonCore: received command 60008 (DC_CHILDALIVE),
> > > calling handler (HandleChildAliveCommand)
> > > 5/31 08:24:24 enter Daemons::CheckForNewExecutable
> > > 5/31 08:24:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_master: 1110456335
> > > 5/31 08:24:24 GetTimeStamp returned: 1110456335
> > > 5/31 08:24:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_collector: 1110456335
> > > 5/31 08:24:24 GetTimeStamp returned: 1110456335
> > > 5/31 08:24:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_negotiator: 1110456334
> > > 5/31 08:24:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:24:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_startd: 1110456334
> > > 5/31 08:24:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:24:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_schedd: 1110456334
> > > 5/31 08:24:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:24:24 exit Daemons::CheckForNewExecutable
> > > 5/31 08:24:24 enter Daemons::UpdateCollector
> > > 5/31 08:24:24 Attempting to send update via UDP to collector
> > > 5/31 08:24:24 Can't send UPDATE_MASTER_AD to collector : Failed to
> > > connect to collector
> > > 5/31 08:24:33 DaemonCore: Command received via UDP from host
> > > <xxx.xxx.xxx.50:32773>
> > > 5/31 08:24:33 DaemonCore: received command 60008 (DC_CHILDALIVE),
> > > calling handler (HandleChildAliveCommand)
> > > 5/31 08:29:24 enter Daemons::UpdateCollector
> > > 5/31 08:29:24 Attempting to send update via UDP to collector
> > > 5/31 08:29:24 Can't send UPDATE_MASTER_AD to collector : Failed to
> > > connect to collector
> > > 5/31 08:29:24 enter Daemons::CheckForNewExecutable
> > > 5/31 08:29:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_master: 1110456335
> > > 5/31 08:29:24 GetTimeStamp returned: 1110456335
> > > 5/31 08:29:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_collector: 1110456335
> > > 5/31 08:29:24 GetTimeStamp returned: 1110456335
> > > 5/31 08:29:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_negotiator: 1110456334
> > > 5/31 08:29:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:29:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_startd: 1110456334
> > > 5/31 08:29:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:29:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_schedd: 1110456334
> > > 5/31 08:29:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:29:24 exit Daemons::CheckForNewExecutable
> > > 5/31 08:34:24 enter Daemons::CheckForNewExecutable
> > > 5/31 08:34:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_master: 1110456335
> > > 5/31 08:34:24 GetTimeStamp returned: 1110456335
> > > 5/31 08:34:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_collector: 1110456335
> > > 5/31 08:34:24 GetTimeStamp returned: 1110456335
> > > 5/31 08:34:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_negotiator: 1110456334
> > > 5/31 08:34:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:34:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_startd: 1110456334
> > > 5/31 08:34:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:34:24 Time stamp of running
> > > /opt/condor-6.6.9/sbin/condor_schedd: 1110456334
> > > 5/31 08:34:24 GetTimeStamp returned: 1110456334
> > > 5/31 08:34:24 exit Daemons::CheckForNewExecutable
> > > 5/31 08:34:24 enter Daemons::UpdateCollector
> > > 5/31 08:34:24 Attempting to send update via UDP to collector
> > > 5/31 08:34:24 Can't send UPDATE_MASTER_AD to collector : Failed to
> > > connect to collector
> > > 5/31 08:35:07 DaemonCore: Command received via TCP from host
> > > <xxx.xxx.xxx.50:32777>
> > > 5/31 08:35:07 DaemonCore: received command 453 (RESTART), calling
> > > handler (admin_command_handler)
> > > 5/31 08:35:07 Got admin command (453) and allowing it.
> > > 5/31 08:35:07 NumberOfChildren() returning 4
> > > 5/31 08:35:07 MASTER_TIMEOUT_MULTIPLIER is undefined,
> using default
> > > value of 0
> > > 5/31 08:35:07 Sent SIGTERM to COLLECTOR (pid 2355)
> > > 5/31 08:35:07 MASTER_TIMEOUT_MULTIPLIER is undefined,
> using default
> > > value of 0
> > > 5/31 08:35:07 Sent SIGTERM to NEGOTIATOR (pid 2356)
> > > 5/31 08:35:07 MASTER_TIMEOUT_MULTIPLIER is undefined,
> using default
> > > value of 0
> > > 5/31 08:35:07 Sent SIGTERM to STARTD (pid 2357)
> > > 5/31 08:35:07 MASTER_TIMEOUT_MULTIPLIER is undefined,
> using default
> > > value of 0
> > > 5/31 08:35:07 Sent SIGTERM to SCHEDD (pid 2358)
> > > 5/31 08:35:07 DaemonCore: No more children processes to reap.
> > > 5/31 08:35:07 The COLLECTOR (pid 2355) exited with status 0
> > > 5/31 08:35:07 ProcAPI::buildFamily failed: parent 2355
> not found on
> > > system.
> > > 5/31 08:35:07 ProcAPI: pid 2355 does not exist.
> > > 5/31 08:35:07 NumberOfChildren() returning 3
> > > 5/31 08:35:07 The NEGOTIATOR (pid 2356) exited with status 0
> > > 5/31 08:35:07 ProcAPI::buildFamily failed: parent 2356
> not found on
> > > system.
> > > 5/31 08:35:07 ProcAPI: pid 2356 does not exist.
> > > 5/31 08:35:07 NumberOfChildren() returning 2
> > > 5/31 08:35:07 DaemonCore: No more children processes to reap.
> > > 5/31 08:35:07 The STARTD (pid 2357) exited with status 0
> > > 5/31 08:35:07 ProcAPI::buildFamily failed: parent 2357
> not found on
> > > system.
> > > 5/31 08:35:07 ProcAPI: pid 2357 does not exist.
> > > 5/31 08:35:07 NumberOfChildren() returning 1
> > > 5/31 08:35:07 The SCHEDD (pid 2358) exited with status 0
> > > 5/31 08:35:07 ProcAPI: pid 2418 does not exist.
> > > 5/31 08:35:07 ProcAPI::buildFamily failed: parent 2358
> not found on
> > > system.
> > > 5/31 08:35:07 ProcAPI: pid 2358 does not exist.
> > > 5/31 08:35:07 NumberOfChildren() returning 0
> > > 5/31 08:35:07 All daemons are gone. Restarting.
> > > 5/31 08:35:07 Restarting master right away.
> > > 5/31 08:35:07 Doing exec( "/opt/condor-6.6.9/sbin/condor_master" )
> > > 5/31 08:35:07 getExecPath: readlink("/proc/self/exe")
> failed: errno 13
> > > (Permission denied)
> > >
> > > 5/31 08:35:07 PASSWD_CACHE_REFRESH is undefined, using
> default value
> > > of 300
> > >
> > > StartLog error:
> > >
> > > 5/31 09:05:37 Attempting to send update via UDP to collector 5/31
> > > 09:05:37 Error sending update to the collector : Failed
> to connect
> > > to collector 5/31 09:05:37 Error sending update to collector(s)
> > >
> > > Negotiator Sample:
> > >
> > > 5/31 09:05:07 ---------- Started Negotiation Cycle
> ---------- 5/31
> > > 09:05:07 Phase 1: Obtaining ads from collector ...
> > > 5/31 09:05:07 Getting all public ads ...
> > > 5/31 09:05:07 NEGOTIATOR_TIMEOUT_MULTIPLIER is undefined, using
> > > default value of 0 5/31 09:05:07 Couldn't fetch ads: can't find
> > > collector 5/31 09:05:07 Aborting negotiation cycle
> > >
> > > _______________________________________________
> > > Condor-users mailing list
> > > Condor-users@xxxxxxxxxxx
> > > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> >
> >
> > --
> >
> >
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>