Hi Jason,
I made HTCondor openmpi script work in parallel universe with a very simple example with open-mpi-4.1.8.
But from openMPI 4.X to openMPI 5.x, there was a migration from ORTE to PRRTE (https://docs.open-mpi.org/en/v5.0.x/launching-apps/pmix-and-prrte.html#label-running-role-of-pmix-and-prte)
and also environment variable changed (https://docs.open-mpi.org/en/v5.0.x/mca.html#mca-parameter-changes-between-open-mpi-4-x-and-5-x).
I was not able to adapt the script as I am not a system expert (nor an HTCondor or openMPI expert...), but I think it may change a lot.
So I would like to know 2 things :
- Is HTCondor completly agnostic to the version of OpenMPI and
only the openmpi script should be adapted
- If yes, is there someone that already adapt it ;)
Regards,
Fabian
![]() |
Fabian
Lambert Laboratoire de Physique Subatomique et de Cosmologie Chef de service - Service Informatique 53 Avenue des Martyrs. 38026 Grenoble cedex Tel 33 (0)4 76 28 41 97 Fax 33 (0)4 76 28 40 04 fabian.lambert@xxxxxxxxxxxxx http://lpsc.in2p3.fr |
Hi Fabian,
Indeed the openmpiscript example script for wrapping mpirun hasn't been updated in some time, I think it was last made to work with OpenMPI 3.x. Whether or not OpenMPI 5.x's mpirun is compatible with parallel universe depends on how much control we have over launching the worker processes. In the current example, this is done by invoking mpirun on node 0 and specifying the SSH launcher, intercepting the SSH commands and grabbing the arguments for the worker processes that mpirun is expecting to run, using condor_chirp to pass around those arguments to the other nodes, and then finally the worker nodes execute those worker processes with the proper arguments to connect back to the parent mpirun process on node 0.
Without doing a bunch of digging into the current OpenMPI docs, I'm not sure if this method is possible (with hopefully minor tweaking) in the current version, but I'm hoping there may be other OpenMPI experts on this list that might be able to chime in.
Jason Patton
On Fri, Feb 28, 2025 at 6:50âAM Lambert Fabian <fabian.lambert@xxxxxxxxxxxxx> wrote:
_______________________________________________Hi,
I would like to know if HTCondor and OpenMPI-5.0.6 are compatible?
If yes, I would like to know how to adapt the openmpiscript script provided with HTCondor to make it work with openMPI in a parallel universe. Do you have some example?
Regards,
--
Fabian Lambert
Laboratoire de Physique Subatomique et de Cosmologie
Chef de service - Service Informatique
53 Avenue des Martyrs. 38026 Grenoble cedex
Tel 33 (0)4 76 28 41 97 Fax 33 (0)4 76 28 40 04
fabian.lambert@xxxxxxxxxxxxx
http://lpsc.in2p3.fr
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
Join us in June at Throughput Computing 25: https://urldefense.com/v3/__https://osg-htc.org/htc25__;!!Mak6IKo!J6scZgN65eSTeh9pI8Bu9Jmt1Vf6PpSR3m9-WGEYTHur4g5dU8MwJUCW2TXYvi2KQQeyGYqpUOeMLqROo2XSjJy50lhjQTu1pA$
The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe Join us in June at Throughput Computing 25: https://osg-htc.org/htc25 The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/