Thanks Tim. My firewalls have been punched between these two machines for ports 9600-9700, and I verified this (at least for TCP) with telnet. I verified that my host info is set property on the exec node (hector,.228), and it seems to be correctly listing my central manager node (dione,.185), see below. I've set STARTD_DEBUG = FULL_DEBUG on the exec node, and am attaching the output. This output looks similar to output on another setup I have with two Ubuntu and Condor 7.4.4 machines, where the exec node correctly connects. In this situation I see in the CollectorLog StartdAd : Inserting ** "< slot3@$$exec node's name$$, 10.171.2.232 >" In the case of my exec node that won't join, I don't see anything in the CollectorLog saying success or failure. condor_config_val -dump | grep HOST ALLOW_ADMINISTRATOR = $(CONDOR_HOST) ALLOW_NEGOTIATOR = $(CONDOR_HOST) ALLOW_NEGOTIATOR_SCHEDD = $(CONDOR_HOST), $(FLOCK_NEGOTIATOR_HOSTS) ALLOW_OWNER = $(FULL_HOSTNAME), $(ALLOW_ADMINISTRATOR) COLLECTOR_HOST = dione.ia.unc.edu COLLECTOR_HOST_STRING = "$(COLLECTOR_HOST)" CONDOR_ADMIN = root@$(FULL_HOSTNAME) CONDOR_HOST = dione.ia.unc.edu FILESYSTEM_DOMAIN = $(FULL_HOSTNAME) FLOCK_COLLECTOR_HOSTS = $(FLOCK_TO) FLOCK_NEGOTIATOR_HOSTS = $(FLOCK_TO) FULL_HOSTNAME = hector.ia.unc.edu HOSTNAME = hector STARTD_ATTRS = COLLECTOR_HOST_STRING TCP_FORWARDING_HOST = On Thu, Jun 9, 2011 at 9:38 AM, Timothy St. Clair <tstclair@xxxxxxxxxx> wrote: > You might want to verify that your _HOST information is set properly on > your exec node, and that your firewall's have been punched > appropriately. > > condor_config_val -dump | grep HOST > > verify CONDOR_HOST and COLLECTOR_HOST are correct. > > If all else fails set: > STARTD_DEBUG = D_FULLDEBUG and repost > > Cheers, > Tim > > On Wed, 2011-06-08 at 16:47 -0400, Michael Grauer wrote: >> I'm running Condor 7.6.1 on two different CentOS 5.6 machines, one (2 >> cpus, call it Twoproc) being the CONDOR_HOST (and for now is also a >> submit and execute node), and the other (16 cpus, call it Sixteenproc) >> I would like to add to this grid as an execute node, but can't get the >> slots added. >> >> Sixteenproc has MASTER and STARTD daemons running, and when I call >> condor_status on it, it returns the 2 slots from Twoproc, so it seems >> that Sixteenproc can correctly connect to Twoproc. >> >> I can't seem to find any evidence in the logs on either machine that >> Sixteenproc is trying to get its slots added to Twoproc's grid. >> >> >> >> Any advice on how to debug this? It would be much appreciated. >> >> I'm appending the StartLog output from Sixteenproc in case that helps. >> >> >> >> Thanks, >> Mike >> >> >> >> >> 06/08/11 15:31:55 Setting maximum accepts per cycle 4. >> 06/08/11 15:31:55 ****************************************************** >> 06/08/11 15:31:55 ** condor_startd (CONDOR_STARTD) STARTING UP >> 06/08/11 15:31:55 ** /usr/sbin/condor_startd >> 06/08/11 15:31:55 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1) >> 06/08/11 15:31:55 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON >> 06/08/11 15:31:55 ** $CondorVersion: 7.6.1 May 31 2011 BuildID: 339001 $ >> 06/08/11 15:31:55 ** $CondorPlatform: x86_64_rhap_5 $ >> 06/08/11 15:31:55 ** PID = 2293 >> 06/08/11 15:31:55 ** Log last touched time unavailable (No such file >> or directory) >> 06/08/11 15:31:55 ****************************************************** >> 06/08/11 15:31:55 Using config source: /etc/condor/condor_config >> 06/08/11 15:31:55 Using local config sources: >> 06/08/11 15:31:55 /etc/condor/condor_config.local >> 06/08/11 15:31:55 DaemonCore: command socket at <SixteenProc'sIP:9608> >> 06/08/11 15:31:55 DaemonCore: private command socket at <SixteenProc'sIP:9608> >> 06/08/11 15:31:55 Setting maximum accepts per cycle 4. >> 06/08/11 15:32:01 VM-gahp server reported an internal error >> 06/08/11 15:32:01 VM universe will be tested to check if it is available >> 06/08/11 15:32:01 History file rotation is enabled. >> 06/08/11 15:32:01 Maximum history file size is: 20971520 bytes >> 06/08/11 15:32:01 Number of rotated history files is: 2 >> 06/08/11 15:32:01 slot1: New machine resource allocated >> 06/08/11 15:32:01 slot2: New machine resource allocated >> 06/08/11 15:32:01 slot3: New machine resource allocated >> 06/08/11 15:32:01 slot4: New machine resource allocated >> 06/08/11 15:32:01 slot5: New machine resource allocated >> 06/08/11 15:32:01 slot6: New machine resource allocated >> 06/08/11 15:32:01 slot7: New machine resource allocated >> 06/08/11 15:32:01 slot8: New machine resource allocated >> 06/08/11 15:32:01 slot9: New machine resource allocated >> 06/08/11 15:32:01 slot10: New machine resource allocated >> 06/08/11 15:32:01 slot11: New machine resource allocated >> 06/08/11 15:32:01 slot12: New machine resource allocated >> 06/08/11 15:32:01 slot13: New machine resource allocated >> 06/08/11 15:32:01 slot14: New machine resource allocated >> 06/08/11 15:32:01 slot15: New machine resource allocated >> 06/08/11 15:32:01 slot16: New machine resource allocated >> 06/08/11 15:32:01 CronJobList: Adding job 'mips' >> 06/08/11 15:32:01 CronJobList: Adding job 'kflops' >> 06/08/11 15:32:01 CronJob: Initializing job 'mips' >> (/usr/libexec/condor/condor_mips) >> 06/08/11 15:32:01 CronJob: Initializing job 'kflops' >> (/usr/libexec/condor/condor_kflops) >> 06/08/11 15:32:01 slot1: State change: IS_OWNER is false >> 06/08/11 15:32:01 slot1: Changing state: Owner -> Unclaimed >> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE >> 06/08/11 15:32:01 slot1: Changing activity: Idle -> Benchmarking >> 06/08/11 15:32:01 BenchMgr:StartBenchmarks() >> 06/08/11 15:32:01 slot2: State change: IS_OWNER is false >> 06/08/11 15:32:01 slot2: Changing state: Owner -> Unclaimed >> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE >> 06/08/11 15:32:01 slot2: Changing activity: Idle -> Benchmarking >> 06/08/11 15:32:01 slot2: Changing activity: Benchmarking -> Idle >> 06/08/11 15:32:01 slot3: State change: IS_OWNER is false >> 06/08/11 15:32:01 slot3: Changing state: Owner -> Unclaimed >> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE >> 06/08/11 15:32:01 slot3: Changing activity: Idle -> Benchmarking >> 06/08/11 15:32:01 slot3: Changing activity: Benchmarking -> Idle >> 06/08/11 15:32:01 slot4: State change: IS_OWNER is false >> 06/08/11 15:32:01 slot4: Changing state: Owner -> Unclaimed >> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE >> 06/08/11 15:32:01 slot4: Changing activity: Idle -> Benchmarking >> 06/08/11 15:32:01 slot4: Changing activity: Benchmarking -> Idle >> 06/08/11 15:32:01 slot5: State change: IS_OWNER is false >> 06/08/11 15:32:01 slot5: Changing state: Owner -> Unclaimed >> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE >> 06/08/11 15:32:01 slot5: Changing activity: Idle -> Benchmarking >> 06/08/11 15:32:01 slot5: Changing activity: Benchmarking -> Idle >> 06/08/11 15:32:01 slot6: State change: IS_OWNER is false >> 06/08/11 15:32:01 slot6: Changing state: Owner -> Unclaimed >> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE >> 06/08/11 15:32:01 slot6: Changing activity: Idle -> Benchmarking >> 06/08/11 15:32:01 slot6: Changing activity: Benchmarking -> Idle >> 06/08/11 15:32:01 slot7: State change: IS_OWNER is false >> 06/08/11 15:32:01 slot7: Changing state: Owner -> Unclaimed >> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE >> 06/08/11 15:32:01 slot7: Changing activity: Idle -> Benchmarking >> 06/08/11 15:32:01 slot7: Changing activity: Benchmarking -> Idle >> 06/08/11 15:32:01 slot8: State change: IS_OWNER is false >> 06/08/11 15:32:01 slot8: Changing state: Owner -> Unclaimed >> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE >> 06/08/11 15:32:01 slot8: Changing activity: Idle -> Benchmarking >> 06/08/11 15:32:01 slot8: Changing activity: Benchmarking -> Idle >> 06/08/11 15:32:01 slot9: State change: IS_OWNER is false >> 06/08/11 15:32:01 slot9: Changing state: Owner -> Unclaimed >> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE >> 06/08/11 15:32:01 slot9: Changing activity: Idle -> Benchmarking >> 06/08/11 15:32:01 slot9: Changing activity: Benchmarking -> Idle >> 06/08/11 15:32:01 slot10: State change: IS_OWNER is false >> 06/08/11 15:32:01 slot10: Changing state: Owner -> Unclaimed >> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE >> 06/08/11 15:32:01 slot10: Changing activity: Idle -> Benchmarking >> 06/08/11 15:32:01 slot10: Changing activity: Benchmarking -> Idle >> 06/08/11 15:32:01 slot11: State change: IS_OWNER is false >> 06/08/11 15:32:01 slot11: Changing state: Owner -> Unclaimed >> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE >> 06/08/11 15:32:01 slot11: Changing activity: Idle -> Benchmarking >> 06/08/11 15:32:01 slot11: Changing activity: Benchmarking -> Idle >> 06/08/11 15:32:01 slot12: State change: IS_OWNER is false >> 06/08/11 15:32:01 slot12: Changing state: Owner -> Unclaimed >> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE >> 06/08/11 15:32:01 slot12: Changing activity: Idle -> Benchmarking >> 06/08/11 15:32:01 slot12: Changing activity: Benchmarking -> Idle >> 06/08/11 15:32:01 slot13: State change: IS_OWNER is false >> 06/08/11 15:32:01 slot13: Changing state: Owner -> Unclaimed >> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE >> 06/08/11 15:32:01 slot13: Changing activity: Idle -> Benchmarking >> 06/08/11 15:32:01 slot13: Changing activity: Benchmarking -> Idle >> 06/08/11 15:32:01 slot14: State change: IS_OWNER is false >> 06/08/11 15:32:01 slot14: Changing state: Owner -> Unclaimed >> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE >> 06/08/11 15:32:01 slot14: Changing activity: Idle -> Benchmarking >> 06/08/11 15:32:01 slot14: Changing activity: Benchmarking -> Idle >> 06/08/11 15:32:01 slot15: State change: IS_OWNER is false >> 06/08/11 15:32:01 slot15: Changing state: Owner -> Unclaimed >> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE >> 06/08/11 15:32:01 slot15: Changing activity: Idle -> Benchmarking >> 06/08/11 15:32:01 slot15: Changing activity: Benchmarking -> Idle >> 06/08/11 15:32:01 slot16: State change: IS_OWNER is false >> 06/08/11 15:32:01 slot16: Changing state: Owner -> Unclaimed >> 06/08/11 15:32:01 State change: RunBenchmarks is TRUE >> 06/08/11 15:32:01 slot16: Changing activity: Idle -> Benchmarking >> 06/08/11 15:32:01 slot16: Changing activity: Benchmarking -> Idle >> 06/08/11 15:32:23 State change: benchmarks completed >> 06/08/11 15:32:23 slot1: Changing activity: Benchmarking -> Idle >> _______________________________________________ >> Condor-users mailing list >> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a >> subject: Unsubscribe >> You can also unsubscribe by visiting >> https://lists.cs.wisc.edu/mailman/listinfo/condor-users >> >> The archives can be found at: >> https://lists.cs.wisc.edu/archive/condor-users/ > > _______________________________________________ > Condor-users mailing list > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/condor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/condor-users/ >
Attachment:
StartLog.gz
Description: GNU Zip compressed data