Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] mpi job stuck as idle
- Date: Wed, 17 Jan 2018 08:19:00 -0600
- From: Jason Patton <jpatton@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] mpi job stuck as idle
Mahmood,
Is condor configured to use a DedicatedScheduler? See:
https://research.cs.wisc.edu/htcondor/manual/current/2_9Parallel_Applications.html#SECTION00392000000000000000
and
https://research.cs.wisc.edu/htcondor/manual/current/3_14Setting_Up.html#SECTION004148000000000000000
Jason Patton
On Wed, Jan 17, 2018 at 1:48 AM, Mahmood Naderan <nt_mahmood@xxxxxxxxx> wrote:
> Hi,
> May I ask why a simple mpihello stuck in the idle state? Te ht script and
> the outputs are shown below:
>
>
> [mahmood@rocks7 ~]$ cat mpi.ht
> universe = parallel
> executable = /opt/openmpi/bin/mpirun
> arguments = ./hellompi
> log = hellompi.log
> output = hellompi.out
> error = hellompi.err
> machine_count = 2
> queue
> [mahmood@rocks7 ~]$ condor_q
>
>
> -- Schedd: rocks7.vbtestcluster.com : <10.0.3.15:9618?... @ 01/17/18
> 02:45:50
> OWNER BATCH_NAME SUBMITTED DONE RUN IDLE
> TOTAL JOB_IDS
> mahmood CMD: /opt/openmpi/bin/mpirun 1/17 02:41 _ _ 1
> 1 4.0
>
> 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
> [mahmood@rocks7 ~]$ condor_q -analyze
>
>
> -- Schedd: rocks7.vbtestcluster.com : <10.0.3.15:9618?...
>
> 004.000: Job has not yet been considered by the matchmaker.
>
>
> 004.000: Run analysis summary ignoring user priority. Of 2 machines,
> 0 are rejected by your job's requirements
> 0 reject your job because of their own requirements
> 0 match and are already running your jobs
> 0 match but are serving other users
> 2 are available to run your job
> [mahmood@rocks7 ~]$ ls -l mpihello.*
> -rw-rw-r-- 1 mahmood mahmood 833 Jan 16 12:48 mpihello.c
> [mahmood@rocks7 ~]$ ls -l hello*
> -rw-rw-r-- 1 mahmood mahmood 0 Jan 17 02:41 hellompi.err
> -rw-rw-r-- 1 mahmood mahmood 134 Jan 17 02:41 hellompi.log
> -rw-rw-r-- 1 mahmood mahmood 0 Jan 17 02:41 hellompi.out
> [mahmood@rocks7 ~]$ cat hellompi.log
> 000 (004.000.000) 01/17 02:41:30 Job submitted from host:
> <10.0.3.15:9618?addrs=10.0.3.15-9618+[--1]-9618&noUDP&sock=2329_79d6_3>
> ...
> [mahmood@rocks7 ~]$ rocks list host
> HOST MEMBERSHIP CPUS RACK RANK RUNACTION INSTALLACTION
> rocks7: Frontend 2 0 0 os install
> compute-0-0: Compute 2 0 0 os install
> [mahmood@rocks7 ~]$
>
>
>
>
>
> Regards,
> Mahmood
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/