Hi,
May I ask why a simple mpihello stuck in the idle state? Te ht script and the outputs are shown below:
[mahmood@rocks7 ~]$ cat mpi.ht
universe = parallel
executable = /opt/openmpi/bin/mpirun
arguments = ./hellompi
log = hellompi.log
output = hellompi.out
error = hellompi.err
machine_count = 2
queue
[mahmood@rocks7 ~]$ condor_q
-- Schedd: rocks7.vbtestcluster.com : <10.0.3.15:9618?... @ 01/17/18 02:45:50
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
mahmood CMD: /opt/openmpi/bin/mpirun 1/17 02:41 _ _ 1 1 4.0
1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
[mahmood@rocks7 ~]$ condor_q -analyze
-- Schedd: rocks7.vbtestcluster.com : <10.0.3.15:9618?...
004.000: Job has not yet been considered by the matchmaker.
004.000: Run analysis summary ignoring user priority. Of 2 machines,
0 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match and are already running your jobs
0 match but are serving other users
2 are available to run your job
[mahmood@rocks7 ~]$ ls -l mpihello.*
-rw-rw-r-- 1 mahmood mahmood 833 Jan 16 12:48 mpihello.c
[mahmood@rocks7 ~]$ ls -l hello*
-rw-rw-r-- 1 mahmood mahmood 0 Jan 17 02:41 hellompi.err
-rw-rw-r-- 1 mahmood mahmood 134 Jan 17 02:41 hellompi.log
-rw-rw-r-- 1 mahmood mahmood 0 Jan 17 02:41 hellompi.out
[mahmood@rocks7 ~]$ cat hellompi.log
000 (004.000.000) 01/17 02:41:30 Job submitted from host: <10.0.3.15:9618?addrs=10.0.3.15-9618+[--1]-9618&noUDP&sock=2329_79d6_3>
...
[mahmood@rocks7 ~]$ rocks list host
HOST MEMBERSHIP CPUS RACK RANK RUNACTION INSTALLACTION
rocks7: Frontend 2 0 0 os install
compute-0-0: Compute 2 0 0 os install
[mahmood@rocks7 ~]$
Regards,
Mahmood