Hi, I am trying to run a small example MPI program (C Program
given in http://www.cs.wisc.edu/condor/manual/v6.6/2_10MPI_Applications.html) And I am trying to run this on two nodes of Linux OS. I
configured those machines to run dedicated jobs. I am submitting the job to
condor central manager (this also one of the dedicated resource) job is running
only on this machine, and job never started on the other resource. In the email msg it says something like this Machine A exited normally with status 0. Machine B:9609>
was never started. In the Negotiator log in the central manager machine I am
seeing like this. Phase 4.1: Negotiating with schedds ... 4/3 08:57:36 Negotiating with senthil@Machine B
at <xx.xxx.xxx.xx:9605> 4/3 08:58:09 condor_read(): timeout reading buffer. 4/3 08:58:09 Failed to get reply
from schedd 4/3 08:58:09 Error: Ignoring schedd for this
cycle Could you please help me how to run the MPI jobs in Condor. Thanks, Senthil |