Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Daemon problems
- Date: Thu, 16 Jun 2005 13:32:47 -0500
- From: Nick LeRoy <nleroy@xxxxxxxxxxx>
- Subject: Re: [Condor-users] Daemon problems
On Thu June 16 2005 5:13 am, Alexandre Badez wrote:
> Good Morning !
Hello,
> I'm running a little test cluster of 6 machines, with redhat 3. They are
> named node1 to node6 (ip @ 10.2.4.11 to 10.2.4.16), and my domain name is
> *.mop.ibm.com
> I've setup the 6 machines with the rpm avaiable on the download pages
> (Condor 6.6.9).
> My central manager is node1, all others are execution hosts.
>
> My problem, seems to be my node1 where there is no negociator:
>
> [root@node1 root]# condor_master
> [root@node1 root]# ps ax | grep condor
> 5137 ? S 0:00 condor_master
> 5138 ? S 0:00 condor_collector -f
> 5139 ? R 0:03 condor_startd -f
> 5142 ? S 0:00 condor_schedd -f
> 5149 pts/0 S 0:00 grep condor
> [root@node1 root]#
I don't know much about how our RPMs configure Condor, but I can see that
something is wrong here... Your central manager (node1) should be running
both the collector and the negotiator. Look at the DAEMON_LIST setting in
the condor_config (or condor_config.local), and make sure that both COLLECTOR
NEGOTIATOR is in the list.
Also, if you don't want to be running jobs on this machine, remove STARTD from
the list. Similarly, if you aren't going to be submitting jobs from this
host, remove SCHEDD from the list.
> Moreover there is a negociator on each execution node:
>
> [root@node2 root]# condor_master
> [root@node2 root]# ps ax | grep condor
> 29704 ? S 0:00 condor_master
> 29705 ? S 0:00 condor_collector -f
> 29706 ? S 0:00 condor_negotiator -f
> 29707 ? S 0:06 condor_startd -f
> 29708 ? S 0:00 condor_schedd -f
> 29717 pts/0 R 0:00 grep condor
> [root@node2 root]#
Again, edit your condor_config on the execution node(s), and remove COLLECTOR
and NEGOTIATOR from the DAEMON_LIST.
As above, I'll note that you're running the schedd here, which allows you to
submit jobs from this host. If this is not what you intended, then remove
SCHEDD from the list.
You'll need to restart Condor on the affected nodes for these changes to take
effect. "condor_restart -master node1", or "/etc/init.d/condor restart" (or
similar).
> Is it normal? After re-reading the installation manual, it don't seems
> so...
Nope. See above. I don't know _why_ they're set as they are, but it's
obviously wrong.
> I can also send the config and config local files if you need them.
Try the above first -- it'll probably solve the problems that you're seeing.
If not, we can pursue it further.
> Thanks for your help.
Glad to help!
-Nick
--
<<< Welcome to the real world. >>>
/`-_ Nicholas R. LeRoy The Condor Project
{ }/ http://www.cs.wisc.edu/~nleroy http://www.cs.wisc.edu/condor
\ / nleroy@xxxxxxxxxxx The University of Wisconsin
|_*_| 608-265-5761 Department of Computer Sciences