Martin, I greatly appreciate the details of your configuration. Along with reviewing the docs you provided, I'm exploring MPI related links found at https://htcondor-wiki.cs.wisc.edu/index.cgi/wikitoc
Greg, This is our setup - I think this means we need a parallel universe, but please let me know if vanilla would work, too.
We will have a cluster of 8 compute nodes, each with dual 16-core HT CPUs, for a total of 256 slots, and an NVidia RTX A4000 GPU. We have two primary (Windows) applications to run on these systems. One uses the GPUs, the other uses MPI. Thanks, Sam From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Beaumont, Martin <Martin.Beaumont@xxxxxxxxxxxxxxx>
Sent: Monday, September 11, 2023 8:57:58 AM To: HTCondor-Users Mail List Subject: [EXTERNAL] Re: [HTCondor-users] MPI on Windows Hello Sam,
I’ve got some experience with MPI jobs using condor, but only with Linux. We haven’t had a requirement for Windows so far (thankfully).
If you haven’t already, you should probably read the documentation for MPI jobs:
Here’s how I personally configure my clusters. (most basic settings only)
Central manager server (all-in-one): /etc/condor/config.d/01-cm.config
# Common configuration (MASTER) CONDOR_HOST = $(hostname --short) ALLOW_DEAMON = $NET_INT_PREFIX.* # Configure host for central management (COLLECTOR, NEGOTIATOR) use ROLE: get_htcondor_central_manager # Configure host for submission of jobs (SCHEDD) use ROLE: get_htcondor_submit # Enable partitionable slot preemption ALLOW_PSLOT_PREEMPTION = True # Speed up reclaiming of unused slots UNUSED_CLAIM_TIMEOUT = 20
And for the execute nodes (compute servers): /etc/condor/config.d/02-role-execute.config
# Common configuration (MASTER) CONDOR_HOST = $(hostname --short) ALLOW_DEAMON = $NET_INT_PREFIX.* # Configure host for jobs execution (STARTD) use ROLE: get_htcondor_execute # Link node to central manager UID_DOMAIN = $(hostname --short) TRUST_UID_DOMAIN = TRUE # Prioritize parallel jobs over serial DedicatedScheduler = "DedicatedScheduler@$(hostname --short)" STARTD_ATTRS = \$(STARTD_ATTRS), DedicatedScheduler START = True SUSPEND = False CONTINUE = True PREEMPT = False KILL = False WANT_SUSPEND = False WANT_VACATE = False RANK = Scheduler =?= \$(DedicatedScheduler) # Activate Dynamic slots configuration and slot partitioning NUM_SLOTS = 1 NUM_SLOTS_TYPE_1 = 1 SLOT_TYPE_1 = auto SLOT_TYPE_1_PARTITIONABLE = True
Replace $(hostname --short) with the network name of your central manager (CM). In my setup, $NET_INT_PREFIX is the first 2 numbers of the IP range of the dedicated local network between the central manager and the execute nodes. I use IDTOKEN security. https://htcondor.readthedocs.io/en/latest/admin-manual/security.html#highlights-of-new-features-in-version-9-0-0 [htcondor.readthedocs.io]
This way, both MPI and serial jobs can be submitted and executed across all nodes, with MPI jobs being prioritized (as in, they can’t be bumped during preemption), and with the CM releasing claimed dynamic partitioned slots if Idle for more than 20 seconds. There might be better ways to configure this, but it gets the job done. :)
As for submit files and wrappers, they are unique to every R&D software we use. Although, I’ve only used Open MPI so far. My wrappers are modified versions of the openmpiscript example. I haven’t tried the MPICH examples (mp1script, mpi2script). I do not think there’s an example file for MPI for Windows. If you don’t run jobs across multiple execute nodes, then as Greg mentioned, the vanilla universe might be simpler with MPI for Windows. (vanilla universe does not accept machine_count in submit files)
Martin
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of Greg Thain via HTCondor-users
On 9/8/23 17:51, Sam.Dana@xxxxxxxxxxx wrote:
It does, but that's a small optimization. To run parallel/dedicated jobs, I'd leave UNUSED_CLAIM_TIMEOUT at the default value of 600 unless you have a good reason to change it, though.
Generally speaking, the most "High Throughput" way to run MPI jobs is to run a lot of independent MPI jobs that each run on one node in your pool, perhaps on many cores on one node. This can be done in the vanilla universe. If you absolutely must run MPI jobs across multiple nodes, then you will need to run the parallel universe.
To run MPI jobs on the parallel universe, you'll need scripts to bootstrap the MPI world. To be honest, I don't know of anyone who has done this on WIndows in quite some time, and I don't know how up to date the examples we provide are with any modern version of MPI for Windows.
-greg
|