[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Fractional GPU

Date: Fri, 23 Feb 2024 14:56:57 +0000
From: Matthew T West <m.t.west@xxxxxxxxxxxx>
Subject: Re: [HTCondor-users] Fractional GPU

Hi Larry,

Have you investigated NVIDIA's MIGhttps://www.nvidia.com/en-gb/technologies/multi-instance-gpu/?

AFAIK, if you partition the cards at boot into sub-units, HTCondor's GPUdiscovery will pick up each of those as distinct entities on the computenode. Would you always want them divided into 1/4s or does this need tobe dynamic partitioning?


Cheers,
Matt

Matthew T. West
DevOps & HPC SysAdmin
University of Exeter, Research IT
http://www.exeter.ac.uk/research/researchcomputing/support/researchit
57 Laver Building, North Park Road, Exeter, EX4 4QE, United Kingdom

On 22/02/2024 22:45, Larry Martell wrote:

CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.


Proceeding under the assumption that condor does not directly support
fractional GPUs, I am trying what I read here:
https://www-auth.cs.wisc.edu/lists/htcondor-users/2020-December/msg00018.shtml:

You can get HTCondor to do this just by having the same device show up more than once in the device enumeration.
For instance, if you have two GPUs and your configuration is
MACHINE_RESOURCE_GPUS = CUDA0, CUDA1
You can run two jobs on each GPU by configuring
MACHINE_RESOURCE_GPUS = CUDA0, CUDA1, CUDA0, CUDA1

I have 1 GPU and this is what I have in my config file:

#use feature:GPUs
#GPU_DISCOVERY_EXTRA = -extra
MACHINE_RESOURCE_GPUs = CUDA0, CUDA0, CUDA0, CUDA0

and this env setting: CUDA_VISIBLE_DEVICES="0"

But when I run multiple jobs requesting a GPU they run serially, not
in parallel.

Has anyone been able to get something like this working?

On Thu, Feb 22, 2024 at 3:53âPM Larry Martell <larry.martell@xxxxxxxxx> wrote:

Does condor support fractional GPUs? I am setting request_GPUs = 0.25
and it is matching (I can see that with -better-analyze and in the
StartLog) but the job never runs, it stays in idle state.

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Follow-Ups:
- Re: [HTCondor-users] Fractional GPU
  - From: Larry Martell

References:
- [HTCondor-users] Fractional GPU
  - From: Larry Martell
- Re: [HTCondor-users] Fractional GPU
  - From: Larry Martell

Prev by Date: Re: [HTCondor-users] building condor rpm metapackage (to avoid circular dependencies)
Next by Date: Re: [HTCondor-users] Fractional GPU
Previous by thread: Re: [HTCondor-users] Fractional GPU
Next by thread: Re: [HTCondor-users] Fractional GPU
Index(es):
- Date
- Thread