_______________________________________________On 12/26/18 10:09 AM, Kodanda Ram Mangipudi wrote:
Hi,
I am a newbie for htcondor set-up and administration though I was a user in the past.ÂWe are trying to set up a pool of 2 machines each with dual CPUs with 20 cores each. I made an i5 6 core machine as the master, and the other 2 HPC workstations as the nodes.All are running Ubuntu 16.04. The condor installation is from the default Ubuntu repositories installed using apt-get.
First, note that you only need to setup a dedicated scheduler and submit MPI jobs with the parallel universe if you want to have MPI jobs ru n concurrently on more than one machine. If you just want your MPI jobs to run on multiple cores on one machine (which is always the fastest kind of interconnects), you can use the vanilla universe.
node01File: /etc/condor/condor_confic (Package manager's copy; Identical to that of master node)ÂÂ
/etc/condor/config.d/00debconf (Edited for configuration)-------------------------------------Begin file ----------------------------------ÂÂ
# Added: by system admin: For Dedicated scheduler for parallel universeDedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxx"
On the nodes, you want the part after the @ sign to be the condor name of the schedd, which is probably not the ip address. You can find the name of a schedd by running
condor_status -sched
and the first column will be the condor name of the schedd. Put that after the @ sign in the config file, and restart, and I think things will work better.
-greg
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/