I am running a Rocks cluster with 8 compute nodes and a head node. I am interested in using Condor for submitting MPI jobs to the cluster. I am having a problem of making the jobs run. Here is what my condor.job file looks like
Now, when I submit this via condor_submit, the system accepts the job and puts it in the queue. The problem is that it stays there and it never runs. Same happens if I change the universe to MPI.
Here is the kicker: if I change the universe to Vanilla, the job executes but ONLY on one of the compute nodes.
Any ideas?
Thanks for your time.
==
Vasil Lalov
Department Of Computer Science
Bowling Green State University
Bowling Green, OH 43403 lalovv@xxxxxxxx