OK part of my problem may be name resolution. I changed the first entry in resolv.conf to point to our local WINS servers that seem to have the correct resolution for the machines I am targeting. I ran condor_q –analyze on the windows node and got the following: Error: can’t find address for <windows machine AD DNS name> Extra Info: You probably saw this error because the condor_schedd is not running on the machine you are trying to query. If the condor_schedd is not running, the Condor system will not be able to find an address and port to connect to and satisfy this request. Please make sure the Condor daemons are running and try again. Extra Info: If the condor_schedd is running on the machine you are trying to query and you still see the error, the most likely cause is that you have setup a personal Condor, you have not defined SCHEDD_NAME in your condor_config file, and something is wrong with your SCHEDD_ADDRESS_FILE setting. You must define either or both of those settings in your config file, or you must use the -name option to condor_q. Please see the Condor manual for details on SCHEDD_NAME and SCHEDD_ADDRESS_FILE. Furthurmore it appears as if the windows machine thinks it’s a master (It has its own IP address in the .master_address file. Even though when I run condor_status on it I get the slots from the master listed. I specifically told it on the install to join a pool and I have done it ona couple of different machines with the same results. At this point I am at a loss and pretty confused. How can I configure the windows condor to NOT think it is a master? (please see a couple of threads back for the install options I used) Maybe then I can tackle the name resolution problem (if it still exists I can now resolve the windows node from linux) Thanks for the help! From: Dunn, George Jr Thanks! I restarted the service (I have firewalls turned off on both machines at this point). The node still did not show up. I ran condor_startd and it opened serveral cmd windows and closed all but one that shows no text. Waited the two minutes still nothing. I will provide what logs seem relevant if I am missing something I can provide: On the new WINDOWS node here is what I have in the MasterLog. Seems kinda weird that it should have condor_master start up but like I said it shows the masters’s execute slots when I run condor_status on the windows machine. 06/04/13 17:22:53 ** condor (CONDOR_MASTER) STARTING UP 06/04/13 17:22:53 ** C:\condor\bin\condor_master.exe 06/04/13 17:22:53 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1) 06/04/13 17:22:53 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON 06/04/13 17:22:53 ** $CondorVersion: 7.8.8 Mar 20 2013 BuildID: 110288 $ 06/04/13 17:22:53 ** $CondorPlatform: x86_64_winnt_6.1 $ 06/04/13 17:22:53 ** PID = 2756 06/04/13 17:22:53 ** Log last touched 6/4 16:22:52 06/04/13 17:22:53 ****************************************************** 06/04/13 17:22:53 Using config source: C:\condor\condor_config 06/04/13 17:22:53 Using local config sources: 06/04/13 17:22:53 C:\condor/condor_config.local 06/04/13 17:22:53 DaemonCore: command socket at <x.x.x.x:51435> 06/04/13 17:22:53 DaemonCore: private command socket at <x.x.x.x:51435> 06/04/13 17:22:53 Setting maximum accepts per cycle 8. 06/04/13 17:22:54 Started DaemonCore process "C:\condor/bin/condor_startd.exe", pid and pgroup = 184 06/04/13 17:22:54 Started DaemonCore process "C:\condor/bin/condor_kbdd.exe", pid and pgroup = 5376 Here is the MasterLog on the linux master: 06/03/13 15:39:13 ****************************************************** 06/03/13 15:39:13 ** condor_master (CONDOR_MASTER) STARTING UP 06/03/13 15:39:13 ** /usr/local/condor/sbin/condor_master 06/03/13 15:39:13 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1) 06/03/13 15:39:13 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON 06/03/13 15:39:13 ** $CondorVersion: 7.8.8 Mar 20 2013 BuildID: 110288 $ 06/03/13 15:39:13 ** $CondorPlatform: x86_64_rhap_6.3 $ 06/03/13 15:39:13 ** PID = 1535 06/03/13 15:39:13 ** Log last touched 6/3 15:36:34 06/03/13 15:39:13 ****************************************************** 06/03/13 15:39:13 Using config source: /usr/local/condor/etc/condor_config 06/03/13 15:39:13 Using local config sources: 06/03/13 15:39:13 /home/condor/condor_config.local 06/03/13 15:39:13 DaemonCore: command socket at <152.20.244.221:43548> 06/03/13 15:39:13 DaemonCore: private command socket at <152.20.244.221:43548> 06/03/13 15:39:13 Setting maximum accepts per cycle 8. 06/03/13 15:39:13 Started DaemonCore process "/usr/local/condor/sbin/condor_collector", pid and pgroup = 1536 06/03/13 15:39:13 Waiting for /home/condor/log/.collector_address to appear. 06/03/13 15:39:14 Found /home/condor/log/.collector_address. 06/03/13 15:39:14 Started DaemonCore process "/usr/local/condor/sbin/condor_negotiator", pid and pgroup = 1537 06/03/13 15:39:14 Started DaemonCore process "/usr/local/condor/sbin/condor_schedd", pid and pgroup = 1538 06/03/13 15:39:14 Started DaemonCore process "/usr/local/condor/sbin/condor_startd", pid and pgroup = 1539 06/03/13 16:39:13 Preen pid is 1716 06/04/13 16:39:13 Preen pid is 4958
When the service is running I see the following files listed in the c:\condor\logs directory: 06/04/2013 04:33 PM 114 .kbdd_address 06/04/2013 04:33 PM 114 .master_address 06/04/2013 05:12 PM 114 .startd_address 06/04/2013 05:12 PM 78 .startd_claim_id.slot1 06/04/2013 05:12 PM 78 .startd_claim_id.slot2 06/04/2013 05:12 PM 78 .startd_claim_id.slot3 06/04/2013 05:21 PM 10,838 KbdLog 06/04/2013 05:21 PM 0 list.txt 06/04/2013 05:21 PM 13,371 MasterLog 06/04/2013 05:12 PM 5,895 StarterLog 06/04/2013 05:21 PM 27,976 StartLog 06/03/2013 04:35 PM 600 TOOLLog Which seems good but when I look in the .startd address I get $CondorPlatform: x86_64_winnt_6.1 $ This is a 32bit OS. Is this a problem? Thanks! Eddie From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of David Peter Lisin Crespo Hi George. Can you restart cóndor on your win7 machine (net stop cóndor and then net start condor). Wait 2 min and do cóndor_Status. If the slots dont appear, go to cóndor binarys and execute cóndor_startd. If this doesnt work please append logs. Good luck!! El 04/06/2013 22:56, "Dunn, George Jr" <dunng@xxxxxxxx> escribió: Maybe the windows machines are not supposed to show up in condor_status? |