Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Condor nodes vanish temporarily
- Date: Mon, 07 Mar 2011 21:19:51 +0100
- From: Felix Wolfheimer <f.wolfheimer@xxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Condor nodes vanish temporarily
Hi,
I tried your suggestion and after setting the keys in the condor_config
file on the pool my machines do not vanish anymore from the list.
Thank you very much for your help!
Am Samstag, den 05.03.2011, 09:50 -0600 schrieb Daniel Forrest:
> Felix Wolfheimer wrote:
>
> > I'm using Condor 7.4.4 on a pool of four machines running Windows Server
> > 2003 R2. When I look at the machine status using condor_status I can see
> > that the machines vanish temporarily from the list and come back some
> > minutes later (takes up to about 30 min.). The machines are up and
> > running 7x24h and the are always connected to a internal LAN and can see
> > (ping, nslookup etc.) each other all the time.
> >
> > I've looked at the collector logfile and found the following statements
> > which seem to be related to the issue:
> >
> > *** Removing stale ad <my_computer_name>
> >
> > where my_computer_name is the name of the machine which vanishes from
> > the list.
> >
> > As the machines have two network interfaces I tried to explicitly bind
> > Condor to one of them using NETWORK_INTERFACE = ... but that did not
> > change anything. The firewalls of the machines are also switched off.
> >
> > Has anyone an idea what could be the issue?
>
> This sounds familiar. On the pool machines, try setting this:
>
> STARTD_DEBUG = D_COMMAND D_NETWORK
> MASTER_DEBUG = D_COMMAND D_NETWORK
>
> in "condor_config". If this clears up the problem, I can go into more
> detail as to what the problem might be and why this "fixes" it.
>