RE: [Condor-users] Trouble on WinXP with master node


Date: Thu, 3 Feb 2005 08:37:56 -0500
From: Campbell Bradley L CRBE <CampbellBL@xxxxxxxxxxxxxxx>
Subject: RE: [Condor-users] Trouble on WinXP with master node
Hi.  

Does anyone know what would cuase the following errors in Condor on Windows?
	2/1 13:33:05 condor_read(): timeout reading buffer.	(masterlog)
	2/1 13:33:05 condor_write(): Socket closed when trying to write buffer (collectorlog)
This happened with both 6.6.5 and 6.6.8 on Windows XP.  Is there a list of Windows services that Condor depends on?

Thanks,
Brad



-----Original Message-----
From: Campbell Bradley L CRBE [mailto:CampbellBL@xxxxxxxxxxxxxxx]
Sent: Wednesday, February 02, 2005 8:19
To: 'Condor-Users Mail List'
Subject: [Condor-users] Trouble on WinXP with master node



I have successfully setup Condor on 2 WinXP clusters.  After adding a router to each cluster, one of them stopped working so well.  The fellow who setup the computers initially had disabled several services and after much time spent I asked him to do a fresh reinstall, which he did (including SP2).  I have disabled the firewall to avoid those issues for now (it's an isolated LAN).  But I am getting errors I have not encountered before and nothing is working.  In particular I am getting errors in the masterlog on condor_read() and condor_write().  I have attached excerpts from the masterlog and collectorlog and the condor_config file.  Any help will be appreciated, thanks.  

In the config file, note that "node15" is the master node and the computer these files are from.

This happened with version 6.6.5 and 6.6.8.

Thanks
Brad

>From the maserlog:

2/1 13:32:40 ******************************************************
2/1 13:32:40 ** Condor (CONDOR_MASTER) STARTING UP
2/1 13:32:40 ** C:\Condor\bin\condor_master.exe
2/1 13:32:40 ** $CondorVersion: 6.6.8 Jan 31 2005 $
2/1 13:32:40 ** $CondorPlatform: INTEL-WINNT40 $
2/1 13:32:40 ** PID = 2096
2/1 13:32:40 ******************************************************
2/1 13:32:40 Using config file: C:\Condor\condor_config
2/1 13:32:40 Using local config files: C:\Condor/condor_config.local
2/1 13:32:40 DaemonCore: Command Socket at <192.168.1.50:1125>
2/1 13:32:40 Started DaemonCore process "C:\Condor/bin/condor_collector.exe", pid and pgroup = 2108
2/1 13:32:40 Started DaemonCore process "C:\Condor/bin/condor_negotiator.exe", pid and pgroup = 2120
2/1 13:32:40 Started DaemonCore process "C:\Condor/bin/condor_startd.exe", pid and pgroup = 2124
2/1 13:32:40 Started DaemonCore process "C:\Condor/bin/condor_schedd.exe", pid and pgroup = 2144
2/1 13:33:05 condor_read(): timeout reading buffer.

>From the collectorlog

2/1 13:32:40 ******************************************************
2/1 13:32:40 ** condor_collector.exe (CONDOR_COLLECTOR) STARTING UP
2/1 13:32:40 ** C:\Condor\bin\condor_collector.exe
2/1 13:32:40 ** $CondorVersion: 6.6.8 Jan 31 2005 $
2/1 13:32:40 ** $CondorPlatform: INTEL-WINNT40 $
2/1 13:32:40 ** PID = 2108
2/1 13:32:40 ******************************************************
2/1 13:32:40 Using config file: C:\Condor\condor_config
2/1 13:32:40 Using local config files: C:\Condor/condor_config.local
2/1 13:32:40 DaemonCore: Command Socket at <192.168.1.50:9618>
2/1 13:32:40 In ViewServer::Init()
2/1 13:32:40 In CollectorDaemon::Init()
2/1 13:32:40 In ViewServer::Config()
2/1 13:32:40 In CollectorDaemon::Config()
2/1 13:32:55 enable: Creating stats hash table
2/1 13:33:05 (Sent 0 ads in response to query)
2/1 13:33:05 WARNING:  No master ad for < node15 >
2/1 13:33:05 ScheddAd     : Inserting ** "< node15 , 192.168.1.50 >"
2/1 13:33:05 stats: Inserting new hashent for 'Schedd':'node15':'192.168.1.50'
2/1 13:33:05 condor_write(): Socket closed when trying to write buffer
2/1 13:33:05 Buf::write(): condor_write() failed
2/1 13:33:05 SECMAN: Error sending response classad!
2/1 13:33:20 WARNING:  No master ad for < node07 >
2/1 13:33:20 ScheddAd     : Inserting ** "< node07 , 192.168.1.57 >"
2/1 13:33:20 stats: Inserting new hashent for 'Schedd':'node07':'192.168.1.57'
2/1 13:33:20 condor_write(): Socket closed when trying to write buffer
2/1 13:33:20 Buf::write(): condor_write() failed
2/1 13:33:20 SECMAN: Error sending response classad!
2/1 13:33:20 ** Master < node15 > rejuvenated from recently down
2/1 13:33:20 stats: Inserting new hashent for 'Master':'node15':'192.168.1.50'
2/1 13:33:20 condor_write(): Socket closed when trying to write buffer
2/1 13:33:20 Buf::write(): condor_write() failed
2/1 13:33:20 SECMAN: Error sending response classad!
2/1 13:33:20 ERROR: DC_AUTHENTICATE unable to receive auth_info!
2/1 13:33:20 ERROR: DC_AUTHENTICATE unable to receive auth_info!
2/1 13:33:20 ERROR: DC_AUTHENTICATE unable to receive auth_info!
2/1 13:33:20 Got QUERY_STARTD_PVT_ADS
2/1 13:33:20 (Sent 0 ads in response to query)
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users

[← Prev in Thread] Current Thread [Next in Thread→]