Hi Imre,
Thanks for sharing your solution with me. I use windows, and I
found that
the environment variable %_CONDOR_JOB_AD% refer to a .job.ad file,
and some useful information are provided in that file including the
machines allocated to the job.
I wrote a C program to obtain the machine list
from that file.
Chunbao Miao
From: Imre
Szeberenyi
Date: 2012-08-27 19:36
Subject: Re: [Condor-users] How to get the machine file in
parallel jobs Hi Chunbao,
I had the same problem before.
I have not found proper scripts for submitting in openmpi environment.
I have found one in the share/doc/condor-7.8.1/etc/examples/ directory
which
installs ssh daemons on the remote machines, but I cannot use it in SMP
environment.
Finally I found a solution, may be it helps for you.
- I created a shell script which collects the host info from job
status and
creates a host file containing job IDs and slot numbers for starting
mpirun.
- I force the mpirun to use condor_ssh_to_job. The only problem is the
mpirun checks the format of the host file and if it starts with
numbers it
assumes these are IP addresses. So I added a constant string to the job
IDs and a wrapper starts the condor_ssh_to_job, which removes the
constant string.
I enclosed my scripts, I hope you can find it useful as well.
If your are using openmpi-1.4 change the last command of
condor_openmpi.sh script to
exec $MPIRUN --prefix $MPI_HOME --mca plm_rsh_agent
$_CONDOR_SSH_TO_JOB_WRAPPER \
--hostfile $_CONDOR_PARALLEL_HOSTS_FILE $@
Best,
Imre
2012.08.26. 15:00 keltezéssel, miaocb@xxxxxxx írta:
> Hi All,
> I successfully configured condor to run parallel jobs, but I can't figure out how to get a machine file that can be used by mpiexec or mpirun to start MPI jobs. Is there an environment variable that refers to the machine file?
>
> thanks
>
> Chunbao Miao
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
|