[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Newbie startup question about configuring a simple condor pool



I have a set of equivalent linux hosts (p51, p52, ... p58) that I want to configure as a Condor pool for running parallel jobs.

On p51, I install the manager and submit daemons, as follows:
CONDOR_LOCAL=/var/local/condor/`hostname -s`
$CONDOR_INSTALL/condor_install --install=$CONDOR_INSTALL --install-dir=/var/local/condor --local-dir=$CONDOR_LOCAL --env-scripts-dir=$CONDOR_LOCAL --type=submit,execute,manager

Then I start up and all the daemons seem to run correctly.  I can submit a simple job and it gets dispatched and executed.

Then on host p52 I install the second member of the pool:
CONDOR_MGR=p51
CONDOR_LOCAL=/var/local/condor/`hostname -s`
$CONDOR_INSTALL/condor_configure --install=$CONDOR_INSTALL --install-dir=/var/local/condor --local-dir=$CONDOR_LOCAL  --env-scripts-dir=$CONDOR_LOCAL --type=execute --central-manager=$CONDOR_MGR

To set up parallel scheduling, I modify the /var/local/condor/etc/condor_config file to be:
COLLECTOR_NAME = NuoDB-DHentchel-p51
## Parallel scheduling groups
DedicatedScheduler      = p51
ParallelSchedulingGroup = P5

Then I restart daemons on both machines. 

My assumption was that the --central-manager option would set up host p52 to be a slave to the manager and scheduler running on p51, as long as both hosts used the same COLLECTOR_NAME and scheduling group name. But condor_status on p51 shows only the p51 execute slots and nothing for p52.  When I submit a parallel universe job for 2 hosts it get queued but never dispatched, indicating the scheduler is unaware of the second host.

Is there something I'm overlooking in setting up the pool?  I searched FAQs and the doc, but is there some how-to that goes through the first-time setup of a pool of hosts?

Thanks,
dave





--

David Hentchel

Performance Engineer

www.nuodb.com

(617) 803 - 1193