[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Multiple Network Interface cards and central managernot communicating with execute machine.



I already have that set on the condor_confiq files of the machines.
144.167.99.210 is the IP of the central manager Network interface card thats connected. Its the only NIC connected on the machine and  it can open a web browser to the internet, ssh and ping other machines on the same router. But in condor the machines will not connect  to each other. I run condor_master on both machines and they can never connect. :(

----- Original Message -----
From: hailong.yang1115 <hailong.yang1115@xxxxxxxxx>
Date: Thursday, November 19, 2009 9:09 pm
Subject: Re: [Condor-users] Multiple Network Interface cards and central managernot communicating with execute machine.
To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>

> 

> Hi Charles,
>
> You can try to add the following:
> NETWORK_INTERFACE=your specific network interface
> into the configuration file to see if it works.
>
> Good luck!
>
> -Hailong
>
> 2009-11-20

>
>
> ***********************************************
> * Hailong Yang, PhD. Candidate
> * Sino-German Joint Software Institute,
> * School of Computer Science&Engineering, Beihang University
> * Phone: (86-010)82315908
> * Email: hailong.yang1115@xxxxxxxxx
> * Address: G413, New Main Building in Beihang University,
> *              No.37 XueYuan Road,HaiDian District,
> *              Beijing,P.R.China,100191
> ***********************************************

> 发件人: Charles Embry
> 发送时间: 2009-11-20  05:29:53
> 收件人: condor-users
> 抄送:
> 主题: [Condor-users] Multiple Network Interface cards and central managernot communicating with execute machine.
>
>
> The condor pool that I am trying to set up is on the same server rack/router and the machines can ping each other and ssh each other. But in condor they don;t seem to be communicating, condor_status never shows the the execute machine that I am trying to add to the central manager(that is also a submit and execute machine) . The machines are all sunfire Sun mirosystems servers. they all have 4 NICS, (Network Interface cards) We are only using one(we have no need at this time to use all of them) and the other three on each machine is not hooked up to anything.
>
> On the execute machine i get this error in the logs fie

> Master log__________
>
> 11/16 17:07:18 DaemonCore: Command Socket at <144.167.99.201:49652>
> 11/16 17:07:18 Started DaemonCore process "/root/Desktop/condor-7.2.4/sbin/condor_startd", pid and pgroup = 27436
> 11/16 17:07:23 attempt to connect to <144.167.99.210:9618> failed: No route to host (connect errno = 113).  Will keep trying for 20 total seconds (20 to go).
>
> 11/16 17:07:44 attempt to connect to <144.167.99.210:9618> failed: No route to host (connect errno = 113).
>
> StartLog__________
> 11/19 15:48:58 slot1: State change: IS_OWNER is false
> 11/19 15:48:58 slot1: Changing state: Owner -> Unclaimed
> 11/19 15:49:23 attempt to connect to <144.167.99.210:9618> failed: No route to host (connect errno = 113).
> 11/19 15:49:23 ERROR: SECMAN:2004:Was waiting for TCP auth session to <144.167.99.210:9618>, but it failed.
> 11/19 15:49:23 Failed to start non-blocking update to <144.167.99.210:9618>.
> 11/19 15:49:23 ERROR: SECMAN:2004:Was waiting for TCP auth session to <144.167.99.210:9618>, but it failed.
> 11/19 15:49:23 Failed to start non-blocking update to <144.167.99.210:9618>.
> 11/19 15:49:23 ERROR: SECMAN:2004:Was waiting for TCP auth session to <144.167.99.210:9618>, but it failed.
> 11/19 15:49:23 Failed to start non-blocking update to <144.167.99.210:9618>.
> 11/19 15:49:23 ERROR: SECMAN:2004:Failed to create security session to <144.167.99.210:9618> with TCP.|SECMAN:2003:TCP connection to <144.167.99.210:9618> failed.

> The condor_collector Dameon  is using the 9618 socket  on the central manager and thats the socket on the central manager that the execute machine is trying to connect to.. Why do the machines not connect in condor(No route to host??) when they can ping and ssh each other? Do i need to set something to make condor use the only network interface that is connected,? Or is it the socket that is being used by the collector on the central manager?                

>

> Thanks for the much needed help.

>

>                

>

> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-
> request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/