Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] oversubscribing gpus
Hey David,
The trick is wrapping the condor_gpu_discovery program so that it generates a double-list of the DetectedGPUs. You can change the MACHINE_RESOURCE_INVENTORY_GPUs configuration setting to call your wrapper instead of the actual binary.
The wrapper would look at the o instead of:
DetectedGPUs = "CUDA0, CUDA1, CUDA2, CUDA3"
You want to have:
DetectedGPUs = "CUDA0, CUDA1, CUDA2, CUDA3, CUDA0, CUDA1, CUDA2, CUDA3"
Or, if you want to depth-first fill the GPUs with jobs:
DetectedGPUs = "CUDA0, CUDA0, CUDA1, CUDA1, CUDA2, CUDA2, CUDA3, CUDA3"
You might also want to have your wrapper modify the CUDAGlobalMemoryMB value (from -properties option) to half of the actual value, just in case any jobs set up requirements based on the CUDA memory.
Michael V Pelletier
Principal Engineer
Raytheon Technologies
Information Technology
Digital Transormation & Innovation
-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of David Schultz
Sent: Monday, October 5, 2020 11:12 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [External] [HTCondor-users] oversubscribing gpus
Hi all,
Does anyone have a recipe for oversubscribing GPU resources 2:1, so each GPU would have two slots? I think I can figure out how to do it completely manually, but was wondering if there was a nice way to hook into HTCondor's GPU detection.
Thanks,
David Schultz
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://urldefense.com/v3/__https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users__;!!MvWE!T0QNSCV9UQWVHW7XjESyNr9Lm8Y53vjHQ5wguGTxXIt_nXL-k7X6A9NuQMPpOVXB1SdMZw$
The archives can be found at:
https://urldefense.com/v3/__https://lists.cs.wisc.edu/archive/htcondor-users/__;!!MvWE!T0QNSCV9UQWVHW7XjESyNr9Lm8Y53vjHQ5wguGTxXIt_nXL-k7X6A9NuQMPpOVWEedd8ew$