On 12/26/18 10:09 AM, Kodanda Ram
Mangipudi wrote:
First, note that you only need to setup a dedicated scheduler and
submit MPI jobs with the parallel universe if you want to have MPI
jobs ru n concurrently on more than one machine. If you just want
your MPI jobs to run on multiple cores on one machine (which is
always the fastest kind of interconnects), you can use the vanilla
universe.
On the nodes, you want the part after the @ sign to be the condor name of the schedd, which is probably not the ip address. You can find the name of a schedd by running
condor_status -sched and the first column will be the condor name of the schedd. Put that after the @ sign in the config file, and restart, and I think things will work better.
-greg
|