Mike, 
     
    It would be helpful to know what processes are using so much CPU on
    your central manager.  I am used to seeing central managers handle
    thousands of slots without trouble (albeit in Linux), so something
    must be unusual about your situation. 
     
    --Dan 
     
    On 9/20/10 8:26 AM, Michael O'Donnell wrote:
    
       
      Thank you Mag. I did a couple
        things
        and will explain what I found out.
       
       
      First, I attempted to move the
        collector
        to a second server by specifying a different host. This would
        not work
        and as a result the central manager could not pick up any
        machines in the
        pool. I then tried to have the collector daemon run on both the
        central
        manager server as well as a second server but have the second
        server be
        the collector_host. This still did not work. I guess I do not
        understand
        the concepts here because it seems like one should be able to
        have the
        collector or multiple collectors run on different hosts.
       
       
      Second, I decided to change
        servers
        for my central manager. We have about 115 slots in our pool.
        Every slot
        is a windows OS and including the central manager. I moved the
        central
        manager from a windows 2008 (32 bit) server with 2 physical CPUs
        to a dual
        quad core running as 64bit. My overall CPU load dropped from 80%
        to about
        6-10% distributed across all cores. When I did this I was
        finally able
        to submit jobs.
       
       
      We did not have any problems when
        we
        were running about 50 slots. As soon as I doubled our pool size,
        any submitted
        jobs would sit in the pool for up to 10 hours before running.
        The CPU load
        increased from about 20% to 80% after doubling the pool size on
        the windows
        2008 server (2 physical CPUs). Every machine could be tracked in
        the pool
        with Condor, but jobs were not be submitted because no matches
        could be
        made. If I looked at the classAds, there were a ton of machines
        that were
        available. So either the collector was not working properly or
        the negotiator
        was not working. It was probably related to the negotiator, but
        I thought
        if I could off load the server by moving the collector this
        would help.
       
       
      As soon as I changed to a dual
        quad
        core, everything worked instantly. Based on everything I have
        read, our
        server should have been plenty to handle such a small pool. 
       
       
       
      It would be extremely interesting
        to
        see a graph noting the performance of Condor with increasing
        pool sizes.
        I do not know if anyone has any data on this, but if you do I
        would love
        to see it.
       
       
      Thank you,
       
      Mike
       
       
       
         
      
       
       
       
      
       
       
       
       
       
      I too would like to know but unfortunately I
          don't
          think its possible. 
           
          I don't think the collector is the problem in your situation.
          How many 
          machines are in your pool?  How many matches are occurring
          every X 
          minutes and how many free slots are available? 
           
          On Fri, Sep 17, 2010 at 10:35 AM, Michael O'Donnell
          <odonnellm@xxxxxxxx>
          wrote: 
          > 
          > Does anyone have any insight as to how one might
          configure a pool
          where the 
          > collector is located on a different server than that of
          the central
          manager. 
          > I am asking because we have expanded our pool and our
          server (2 CPUs--3GHs) 
          > is running at 80% CPU load. Any jobs that we submit are
          taking as
          long as 10 
          > hours before a match is found and the jobs run. The only
          thing I can
          think 
          > of is that the loading on the server is causing the
          problems, which
          has 
          > significantly increased after doubling the size of our
          pool. 
          > 
          > Our pool includes all windows systems, we have
          approximately 115 slots
          and 
          > we are using the strictest of security settings (SSL,
          authentication, 
          > encryption, authorization, integrity). 
          > 
          > I have read the wiki 
          > (https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigCollectors) 
          > about using milti-tier collectors, but for our system I
          think all
          I need to 
          > do is locate the collector on a different machine. I have
          set up the
          local 
          > configuration files so the COLLECTOR daemon is running on
          the second
          host, 
          > and the global config files specifies the host of the
          collector with 
          > COLLECTOR_HOST. The manual has no information on doing
          this, and I
          am not 
          > sure if the Collector daemon is required to run on the CM
          as well
          as the 
          > second host. After I made these changes, I can no longer
          query the
          machines 
          > in the pool. 
          > 
          > Thank you for your suggestions, 
          > Mike 
          > 
          > _______________________________________________ 
          > Condor-users mailing list 
          > To unsubscribe, send a message to
          condor-users-request@xxxxxxxxxxx
          with a 
          > subject: Unsubscribe 
          > You can also unsubscribe by visiting 
          > https://lists.cs.wisc.edu/mailman/listinfo/condor-users 
          > 
          > The archives can be found at: 
          > https://lists.cs.wisc.edu/archive/condor-users/ 
          > 
          > 
          _______________________________________________ 
          Condor-users mailing list 
          To unsubscribe, send a message to
          condor-users-request@xxxxxxxxxxx with
          a 
          subject: Unsubscribe 
          You can also unsubscribe by visiting 
        https://lists.cs.wisc.edu/mailman/listinfo/condor-users 
           
          The archives can be found at: 
        https://lists.cs.wisc.edu/archive/condor-users/ 
        
       
       
      
 
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
     
  
 |