Hello, I have a condor pool consisting of a Windows XP machine
(master) and a Windows 2003 machine (slave), and am unable to get jobs to run on
the Windows 2003 machine (they sit in the queue with an ‘Idle’
status until ‘master’ slots become available). Does anyone know how to fix this problem? Running condor_status from the master indicates all machines
are in the pool. In the condor_config files, I have: On the Slave: DAEMON_LIST = MASTER START On the Master: DAEMON_LIST = MASTER COLLECTOR NEGOTIATOR SCHEDD START On the Slave: COLLECTOR_NAME = My Pool On the Master: COLLECTOR_NAME = fbp-test-pool On the Slave: CONDOR_HOST = <ip address of master> On the master: CONDOR_HOST = $(FULL_HOSTNAME) Also on the slave:
ADD_WINDOWS_FIREWALL_EXCEPTION = FALSE
WINDOWS_FIREWALL_FAILURE_RETRY = 10 When I run condor_q on the slave machine, I get error: Error: Can't find address for schedd <my windows 2003
machine> Extra Info: You probably saw this error because the
condor_schedd is not running on the machine you are trying to query. If the
condor_schedd is not running, the Condor system will not be able to find an
address and port to connect to and satisfy this request. Please make sure the
Condor daemons are running and try again. Extra Info: If the condor_schedd is running on the
machine you are trying to query and you still see the error, the most likely cause
is that you have setup a personal Condor, you have not defined SCHEDD_NAME
in your condor_config file, and something is wrong with your
SCHEDD_ADDRESS_FILE setting. You must define either or both of those settings
in your config file, or you must use the -name option to condor_q.
Please see the Condor manual for details on SCHEDD_NAME and
SCHEDD_ADDRESS_FILE. I guess this makes sense since schedd is NOT running on
the slave. Also,
when I reverse the roles (make the Windows XP the slave and the Windows 2003
the master) I get the same results (jobs run on the Windows XP machine but not
the 2003 machine). Any
help would be appreciated. Thanks, Diane |