[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Htcondor with python ray



Yeah that's the option in ray, but it's not recommended to run ray start multiple time on same machineÂ
In dynamic slots, if 2 jobs of ray cluster starts on same machine they might crash(not tested)
Also ray stop doesn't have option to check for different workers, it will stop both workers in downscale, which might result in crashing of some jobs in other slot.

On Tue, Nov 25, 2025, 23:51 Michael DiDomenico <mdidomenico4@xxxxxxxxx> wrote:
On Tue, Nov 25, 2025 at 6:06âPM Ram Ban <ramban046@xxxxxxxxx> wrote:
>
> Ray actually uses all vm resources regardless of cgroup and cpu affinities set.
> In static slots I tend to set it only 1 slot so that whole vm is used, but in this case my utilisation is too low as I am not able to share the resources.
> I am exploring to use docker universe for isolation. But most of stuff on internet really says to use kubernetes with ray.

i'm not sure that's a fair statement. ray will consume as much as you
let it. if you choke down a cgroup, ray (even though it can detect
the box has more) wont be able to use more then the cgroup says. also
you can choke ray down when you start it by specifying how much
cpu/mem/gpu resources it can use

yes, its true kuberay probably is the right solution and probably the
best supported, but you have to be willing to buy into that entire
ecosystem

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/