Hello again, So after several tests during the path month, Iâve finally made it work. First, during the configure process of compiling Open MPI, OFI support must be added: Transports ----------------------- OpenFabrics OFI Libfabric: yes For me, this meant I required to install the libfabric-devel package on my system (Rocky Linux 8.x). This made jobs using the openib work again with Open MPI 4.1.x under condor, but still wasnât working for ucx jobs. Second, and for reasons I do not know, you have to disable btl ofi when launching ucx jobs. In other words, Open MPI must be compiled with OFI support, but also be disabled for UCX jobs to work correctly within condor. Simply excluding OFI support during Open MPI compilation does not make UCX jobs work under condor. Here are the working mca parameters for both types of transport: --mca btl ^openib,ofi --mca pml ucx --mca plm rsh # UCX --mca btl openib,self --mca pml ^ucx --mca plm rsh # OpenIB Some quick unprofessional benchmarks using an SU2 example shows some improvements between Open MPI versions and transports selection, at least proving that it wasnât all for nothing! Open MPI Transport Time 4.1.4 openib 252s 4.1.4 ucx 242s 4.0.7 openib 250s 4.0.7 ucx 250s 3.1.6 openib 292s 3.1.6 ucx 287s Martin From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of Jason Patton via HTCondor-users Hi Martin, I helped develop part of the openmpiscript a few years ago, and it hasn't been tested since the days that OpenMPI 3.x was current, so I'm not too surprised that it's probably time to look at it again. I don't know what MCA parameters are
available for UCX, but maybe you could try fiddling with MCA parameters to crank up the verbosity of messages (based on your email, I'm guessing one of these could be "--mca pml_ucx_verbose 100") and send along what you find. Jason Patton On Thu, Nov 17, 2022 at 12:07 PM Beaumont, Martin <Martin.Beaumont@xxxxxxxxxxxxxxx> wrote:
|