[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Parallel Universe on Kubernetes



Hi Folks!

 

I’m the HPC monkey of which he speaks! I’ve been creating a Kubernetes Operator with HTCondor as the scheduler, and doing fairly well up until I needed to use the parallel universe. To not clutter your inboxes, here is a summary of where I am currently at!

 

https://gist.github.com/vsoch/2073136f0833983efc92b4eeb52d49dd

 

TLDR: if we could easily adopt the current setup with the docker images here https://github.com/htcondor/htcondor/tree/main/build/docker/services to allow for this parallel universe, that would likely be the example that I need to get it working in Kubernetes. The current working (for basic jobs) setup is here: https://github.com/converged-computing/htcondor-operator and my (so far) failed attempts are under the single opened PR to add LAMMPS. I’m happy to show you / debug anything you might be interested in. Thanks again for your help, and apologies for my noob-level expertise – I’m only about a day into using this beastie!

 

Best,

 

Vanessa

 

From: Matthew T West <m.t.west@xxxxxxxxxxxx>
Date: Monday, June 19, 2023 at 2:01 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Sochat, Vanessa <sochat1@xxxxxxxx>
Subject: Parallel Universe on Kubernetes

Good evening all,

I have someone from cloud HPC community curious about running multi-node
MPI jobs with HTCondor with a pool on a kubernetes cluster? Is it
possible with just grid universe or does one need to set up parallel
universe? This work is leveraging the existing container images for
central manager, access point and worker nodes.

I grant this isn't a common use-case for this community, but I feel it's
worth asking.

Cheers,
Matthew West