Hi Eric,
Sure, I'm happy to. Like I said, our machines have one partitionable job slot for each GPU, so the worker job slot config looks something like this: >> NUM_SLOTS = 2 >> NUM_SLOTS_TYPE_1 = 1 >> SLOT_TYPE_1 = cpus=3,mem=30122 >> SLOT_TYPE_1_PARTITIONABLE = true >> SLOT_TYPE_1_GPU_NUM = 0 >> NUM_SLOTS_TYPE_2 = 1 >> SLOT_TYPE_2 = cpus=3,mem=30122 >> SLOT_TYPE_2_PARTITIONABLE = true >> SLOT_TYPE_2_GPU_NUM = 1 >> GPU_MEMORY = 8000 >> MACHINE_RESOURCE_GPUMEMORY = 16000 >> >> STARTD_ATTRS = GPU_NUM, $(STARTD_ATTRS) The default usage of full GPUs is handled by condor with >> use feature : gpus The key parts here are that we set a STARTD_ATTR called GPU_NUM for each slot, which is later used to set CUDA_VISIBLE_DEVICES, and that we add a new resource GPUMEMORY (in this instance, we have two identical GPUs with 8GB VRAM each). A user can then request a certain amount of GPU memory in their submit file the same way they would request other machine resources: >> Request_GpuMemory = 2000 Since we allow both the request of full GPUs and the request of only a part of the memory, we make sure that they don't collide. If some GPU memory is already used, no full GPU can be requested and vice-versa. This is done in the START _expression_: >> START = (IfThenElse(target.RequestGpuMemory =?= UNDEFINED, 0, target.RequestGpuMemory) == 0 || my.GPUs == my.TotalSlotGPUs) && \ >> (IfThenElse(target.RequestGPUs =?= UNDEFINED, 0, target.RequestGPUs) == 0 || my.GpuMemory == my.TotalSlotGpuMemory) Setting the CUDA_VISIBLE_DEVICES environment variable is done in a user job wrapper [1], which is defined in the worker config >> USER_JOB_WRAPPER=/etc/condor/set_cuda_env To monitor the GPU memory usage separately for each condor job, we replace the default GPU monitoring script with our own [2]. >> # GPU Memory monitor >> STARTD_CRON_GPUsMEMORY_MONITOR_EXECUTABLE=/etc/condor/monitor_gpus.py >> STARTD_CRON_GPUsMEMORY_MONITOR_METRICS = PEAK:GPUsMemory >> STARTD_CRON_GPUsMEMORY_MONITOR_MODE = Periodic >> STARTD_CRON_GPUsMEMORY_MONITOR_PERIOD = 30 >> >> STARTD_CRON_GPUs_MONITOR_EXECUTABLE=/bin/false >> STARTD_CRON_JOBLIST=$(STARTD_CRON_JOBLIST),GPUsMEMORY_MONITOR >> >> STARTD_JOB_ATTRS=$(STARTD_JOB_ATTRS),GPUsMemory >> UPDATE_INTERVAL=30 Finally, if the used GPU memory is reported to be larger than the requested one, the job is killed by the SYSTEM_PERIODIC_REMOVE macro: >> SYSTEM_PERIODIC_REMOVE = $(SYSTEM_PERIODIC_REMOVE) || ((GPUsMemoryUsage > RequestGpuMemory) && (RequestGPUs == 0)) Best regards, Yannik ------------------------------------ [1] #!/bin/bash if [ "$_CONDOR_MACHINE_AD" != "" ]; then GPU_NUM="$(egrep '^GPU_NUM' "$_CONDOR_MACHINE_AD" | cut -d ' ' -f 3)" SLOT_GPUS="$(egrep '^TotalSlotGPUs' "$_CONDOR_MACHINE_AD" | cut -d ' ' -f 3)" SLOT_GPUMEM="$(egrep '^TotalSlotGPUMEMORY' "$_CONDOR_MACHINE_AD" | cut -d ' ' -f 3)" # If GPU number is defined (on the partitionable slot) and the job is a GPU job, set visible device if [[ "$GPU_NUM" != "" ]] && ( [[ "$SLOT_GPUS" != "0" ]] || [[ "$SLOT_GPUMEM" != "0" ]] ); then export CUDA_VISIBLE_DEVICES="$GPU_NUM" else export CUDA_VISIBLE_DEVICES="-1" fi fi exec "$@" [2] #!/usr/bin/env python # -*- coding: utf-8 -*- from subprocess import check_output from collections import defaultdict from psutil import Process starter_signature = ["condor_starter", "-f", "-a"] def query(kind, values): if not isinstance(values, dict): values = {v: str for v in values} gpu_query = check_output(["nvidia-smi", "--query-{}={}".format(kind, ",".join(values.keys())), "--format=csv,nounits,noheader"], universal_newlines=True) query_results = [] for line in gpu_query.splitlines(): line_results = {} for (key, type_converter), value in zip(values.items(), line.strip().split(", ")): line_results[key] = type_converter(value) query_results.append(line_results) return query_results def get_slot(pid): process = Process(pid) while process: cmdline = process.cmdline() if cmdline[:3] == starter_signature: return cmdline[3] process = process.parent() return None def get_slot_updates(slot_name, values): slot_info = [] for attr, value in values.items(): slot_info.append("Uptime{}PeakUsage = {}".format(attr, value)) slot_info.append('SlotName = "{}@"'.format(slot_name)) slot_info.append("- {}".format(slot_name)) return slot_info gpu_info = query("gpu", {"index": int, "utilization.gpu": int}) application_info = query("compute-apps", {"pid": int, "used_gpu_memory": int}) # get total memory usage for each (partitioned) job slot_gpu_memories = defaultdict(int) for line in application_info: slot_id = get_slot(line["pid"]) if slot_id: slot_gpu_memories[slot_id] += line["used_gpu_memory"] updates = [] for slot_id, memory in slot_gpu_memories.items(): updates.extend(get_slot_updates(slot_id, {"GPUsMemory": memory})) if updates: updates.append("- update:true") print("\n".join(updates)) Von: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> im Auftrag von Eric Sedore via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Gesendet: Montag, 30. November 2020 04:26 An: HTCondor-Users Mail List Cc: Eric Sedore Betreff: Re: [HTCondor-users] Running multiple jobs simultaneously on a single GPU Thanks Yannik – yes, if you have time and are willing that would be very helpful.
-Eric
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx]
On Behalf Of Rath, Yannik
Hi Eric,
Von: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
im Auftrag von John M Knoeller <johnkn@xxxxxxxxxxx>
Hi Eric.
Nvidia is adding the ability to share a GPU between processes in newer hardware with hardware enforcement of memory isolation between processes. HTCondor does plan to support that but it does not yet, and I don’t think the NVida devices that support this are very common yet. This is work in progress…
However, You can share a GPU between processes *without* any kind protection between processes just by having more than a single process set the environment variable CUDA_VISIBLE_DEVICES to the same value
You can get HTCondor to do this just by having the same device show up more than once in the device enumeration.
For instance, if you have two GPUs and your configuration is
MACHINE_RESOURCE_GPUS = CUDA0, CUDA1
You can run two jobs on each GPU by configuring
MACHINE_RESOURCE_GPUS = CUDA0, CUDA1, CUDA0, CUDA1
If you don’t use the MACHINE_RESOURCE_GPUS knob, and instead use HTCondor’s GPU detection, you can use the same trick, it’s just a little more work.
# enable GPU discovery use FEATURE : GPUs # then override the GPU device enumeration with a wrapper script that duplicates the detected GPUs MACHINE_RESOURCE_INVENTORY_GPUs = $(ETC)/bin/condor_gpu_discovery.sh $(1) -properties $(GPU_DISCOVERY_EXTRA)
The wrapper script $(ETC)/bin/condor_gpu_discovery.sh is something that you need to write.
condor_gpu_discovery produces output like this
DetectedGPUs="CUDA0, CUDA1" CUDACapability=6.0 CUDADeviceName="Tesla P100-PCIE-16GB" CUDADriverVersion=11.0 CUDAECCEnabled=true CUDAGlobalMemoryMb=16281 CUDAMaxSupportedVersion=11000 CUDA0DevicePciBusId="0000:3B:00.0" CUDA0DeviceUuid="dddddddd-dddd-dddd-dddd-dddddddddddd" CUDA1DevicePciBusId="0000:D8:00.0" CUDA1DeviceUuid="cccccccc-cccc-cccc-cccc-cccccccccccc"
Your wrapper script should produce the same output, but with a modified value for DetectedGPUs like this
DetectedGPUs="CUDA0, CUDA1, CUDA0, CUDA1" CUDACapability=6.0 CUDADeviceName="Tesla P100-PCIE-16GB" CUDADriverVersion=11.0 CUDAECCEnabled=true CUDAGlobalMemoryMb=16281 CUDAMaxSupportedVersion=11000 CUDA0DevicePciBusId="0000:3B:00.0" CUDA0DeviceUuid="dddddddd-dddd-dddd-dddd-dddddddddddd" CUDA1DevicePciBusId="0000:D8:00.0" CUDA1DeviceUuid="cccccccc-cccc-cccc-cccc-cccccccccccc"
-tj
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of Eric Sedore via HTCondor-users
Good evening everyone,
I’ve listened to a few presentations that mentioned there is a way (either ready now or planned) to allow multiple jobs to utilize a single GPU. This would be helpful as we have a number of workloads/jobs that do not consume the entire GPU (memory or processing). Is there documentation (apologies if I missed it) that would assist with how to set up this configuration?
Happy to provide more of a description if my question is not clear.
Thanks, -Eric |