I have a set of equivalent linux hosts (p51, p52, ... p58) that I want to configure as a Condor pool for running parallel jobs.
On p51, I install the manager and submit daemons, as follows:
CONDOR_LOCAL=/var/local/condor/`hostname -s`
$CONDOR_INSTALL/condor_install --install=$CONDOR_INSTALL --install-dir=/var/local/condor --local-dir=$CONDOR_LOCAL --env-scripts-dir=$CONDOR_LOCAL --type=submit,execute,manager
Then I start up and all the daemons seem to run correctly. I can submit a simple job and it gets dispatched and executed.
Then on host p52 I install the second member of the pool:
CONDOR_MGR=p51
CONDOR_LOCAL=/var/local/condor/`hostname -s`
$CONDOR_INSTALL/condor_configure --install=$CONDOR_INSTALL --install-dir=/var/local/condor --local-dir=$CONDOR_LOCAL --env-scripts-dir=$CONDOR_LOCAL --type=execute --central-manager=$CONDOR_MGR
To set up parallel scheduling, I modify the /var/local/condor/etc/condor_config file to be:
COLLECTOR_NAME = NuoDB-DHentchel-p51
## Parallel scheduling groups
DedicatedScheduler = p51
ParallelSchedulingGroup = P5
Then I restart daemons on both machines.
My assumption was that the --central-manager option would set up host p52 to be a slave to the manager and scheduler running on p51, as long as both hosts used the same COLLECTOR_NAME and scheduling group name. But condor_status on p51 shows only the p51 execute slots and nothing for p52. When I submit a parallel universe job for 2 hosts it get queued but never dispatched, indicating the scheduler is unaware of the second host.
Is there something I'm overlooking in setting up the pool? I searched FAQs and the doc, but is there some how-to that goes through the first-time setup of a pool of hosts?
Thanks,
dave
--
David Hentchel
Performance Engineer
www.nuodb.com
(617) 803 - 1193