I may have spoken too quickly on the multi-startd
                setup working. I thought my troubles were due to
                collisions on the starter log files, but after
                implementing the fix Todd suggested I'm still seeing
                some bad behaviour (but the fix for the log files worked
                brilliantly).
              
              
              It appears that I can only start jobs under one
                startd or the other. Not both. The first startd to run
                jobs after a Condor restart is the *only* startd that
                will run jobs until Condor is restarted again.
              
              
              For example: I submitted two clusters of jobs. Once
                targeted the slots on the first startd. The other
                targeted the slots on the second startd. If I let the
                first cluster start on the S1 startd then the second
                cluster would attempt to run on the S2 startd and fail.
                And vice versa.
              
              
              The log output on failure is always the same:
              
              
              
                09/07/11 17:07:43 slot1: Got activate_claim request
                  from shadow (<192.168.1.85:3382>)
                09/07/11 17:07:43 slot1: Remote job ID is 9.0
                09/07/11 17:07:43 Result of "register_subfamily"
                  operation from ProcD: ERROR: The given PID is not part
                  of the family tree
                09/07/11 17:07:43 Create_Process: error registering
                  family for pid 1256
                09/07/11 17:07:43 ERROR "error registering process
                  family with procd" at line 7917 in file
c:\condor\execute\dir_4228\userdir\src\condor_daemon_core.v6\daemon_core.cpp
                09/07/11 17:07:43 slot1: Changing state and
                  activity: Claimed/Idle -> Preempting/Killing
                09/07/11 17:07:43 slot1: State change: No
                  preempting claim, returning to owner
                09/07/11 17:07:43 slot1: Changing state and
                  activity: Preempting/Killing -> Owner/Idle
                09/07/11 17:07:43 slot1: State change: IS_OWNER is
                  false
                09/07/11 17:07:43 slot1: Changing state: Owner
                  -> Unclaimed
               
              
              
              It looks like the procd doesn't like the idea of two
                startds on the machine. It appears it can't tell them
                apart apparently and doesn't like the fact that the jobs
                being started on the second startd in this case don't
                have a PPID equal to the PID of the first startd.
              
              
              I'm either missing something that's procd-specific in
                my startd config, or the procd isn't going to work here.
                I'll try disabling the procd but having it there has
                helped with scalability issues I'm trying to overcome so
                if I can make this work with the procd in place I'd be a
                whole lot happier.