Re: [HTCondor-users] GPU benchmarking

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Hi TJ,

thank you so much for the confirmation. Finally, we used STARTD_ATTRS as you recommended and it works well.
We have just one another issue which is that condor_gpu_discovery does not support cards with Compute Capability 8.6.
CoresPerCU value for these cards (I have here GeForce RTX 3060 with CUDA 11.2) is 128, but the Condor utility gives incorrectly 64.

I even found the issue in the source code, there is missing the value for the newest CC in nGpuArchCoresPerSM table in condor_gpu_discovery.cpp:63
The table is updated on NVIDIA sources since Sep-2020, the following value is added: {0x86, 128},

Thanks again,
Masaj

On 5/25/2021 5:23 PM, John M Knoeller wrote:

Unfortunately, HTCondor does not currently translate TARGET.CUDAComputeUnits to CUDA1ComputeUnits when AssignedGPUS is "CUDA1." You must do that yourself using configuration in the STARTD. either STARTD_ATTRS or some sort of STARTD_CRON script.

This is an area of active work in HTCondor. See this ticket

[HTCONDOR-127] More usable selection+binding of GPU devices by user - Jira (atlassian.net)

Which is requesting essentially the same thing that you just described. We plan to have a better solution for this problem in a future 9.1.x release.

-tj

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Martin Sajdl <masaj.xxx@xxxxxxxxx>
Sent: Tuesday, May 25, 2021 9:30 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] GPU benchmarking
Hi Michael,

thank you again! To be honest, our nodes are configured in the way that there is as many slots as many GPUs are plugged in - each slot has just one GPU. So I think the tweak you mentioned is not needed there.
But I wanted to just ensure, that I can use values like TARGET.CUDAComputeUnits in a job rank and it will be correctly translated to e.g. CUDA1ComputeUnits on the slot where AssignedGPUs="CUDA1".
My classads for an example slot are below, each slot has just one GPU assigned, but CUDA* classads for both GPUs plugged in the node.

AssignedGPUs = "CUDA1"
CUDA0Capability = 7.5
CUDA0ClockMhz = 1695.0
CUDA0ComputeUnits = 34
CUDA0CoresPerCU = 64
CUDA0DeviceName = "GeForce RTX 2060 SUPER"
CUDA0DevicePciBusId = "0000:01:00.0"
CUDA0DeviceUuid = "5ffaf895-e943-8da2-23f4-d751418ba217"
CUDA0DriverVersion = 11.2
CUDA0ECCEnabled = false
CUDA0GlobalMemoryMb = 8192
CUDA0OpenCLVersion = 1.2
CUDA0RuntimeVersion = 10.2
CUDA1Capability = 7.5
CUDA1ClockMhz = 1695.0
CUDA1ComputeUnits = 34
CUDA1CoresPerCU = 64
CUDA1DeviceName = "GeForce RTX 2060 SUPER"
CUDA1DevicePciBusId = "0000:02:00.0"
CUDA1DeviceUuid = "d777aeb6-a721-c756-7075-9f19a3a54c2a"
CUDA1DriverVersion = 11.2
CUDA1ECCEnabled = false
CUDA1GlobalMemoryMb = 8192
CUDA1OpenCLVersion = 1.2
CUDA1RuntimeVersion = 10.2

Masaj

On 5/25/2021 3:47 PM, Michael Pelletier via HTCondor-users wrote:
Hi Masaj,

The CUDAComputeUnits figure is reported based on the card or cards installed in the system. There’s actually no attribute CUDA0ComputeUnits, since that’s expected to be the same across all cards.

Here’s what is generated in per-card attributes with the “-extra -dynamic” options:

CUDA0DevicePciBusId = "0000:06:00.0"

CUDA0DeviceUuid = "520c5858-f08d-0e24-83b6-47e072996f2b"

CUDA0DieTempC = 32

CUDA0EccErrorsDoubleBit = 0

CUDA0EccErrorsSingleBit = 0

CUDA0FreeGlobalMemory = 8518

CUDA0PowerUsage_mw = 41538

CUDA0UtilizationPct = 77

You can write expressions to incorporate these values, but it won’t have any impact on which card is chosen for the job. The startd simply takes the next unclaimed device in sequence from the AssignedGPUs list.

One way you can tweak that mechanism is to alter the order of the DetectedGPUs list as the inventory is being taken, perhaps with a wrapper around condor_gpu_discovery. If your machine causes condor_gpu_discovery to list all the cards in one cooling region followed by all the cards in another cooling region within the system, you could balance the heating across both cooling regions by changing the order to “CUDA0,CUDA2,CUDA1,CUDA3” so that GPU assignments would alternate between cooling regions, for example.

Michael V Pelletier

Principal Engineer

Raytheon Technologies

Digital Technology

HPC Support Team

From: Martin Sajdl <masaj.xxx@xxxxxxxxx>
Sent: Tuesday, May 25, 2021 8:15 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Michael Pelletier <michael.v.pelletier@xxxxxxxxxxxx>
Subject: [External] Re: [HTCondor-users] GPU benchmarking

Thank you Michael!
The formula below looks like a good idea. I have one additional question. Is it Okay to use classads in the form TARGET.CUDAComputeUnits when the real slot classad looks like CUDA0ComputeUnits or CUDA1ComputeUnits? Does Condor automatically able to translate to a correct value using AssignedGPU?

Regards,
Masaj

On 5/20/2021 10:14 PM, Michael Pelletier via HTCondor-users wrote:
For my GPU jobs, I set up a ranking based on the number of compute units, times the number of cores per CU. You might also add the global memory. I do like the idea of factoring in the CUDA capability level as well, if your cluster has more than one type of card in it.

So for example, in a submit description:

rank = TARGET. CUDAComputeUnits * TARGET. CUDACoresPerCU + CUDAFreeGlobalMemory

Michael V Pelletier

Principal Engineer

Raytheon Technologies

Digital Technology

HPC Support Team

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Todd Tannenbaum
Sent: Thursday, May 20, 2021 12:49 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>; Martin Sajdl <masaj.xxx@xxxxxxxxx>
Subject: [External] Re: [HTCondor-users] GPU benchmarking

On 5/20/2021 8:56 AM, Martin Sajdl wrote:

Hi!

we have a cluster of nodes with GPUs and we would need to set a benchmark number for each slot with GPU to be able to correctly control jobs ranking - start a job on the most powerful GPU available.
Do someone use or know a GPU benchmark tool? Ideally multi-platform (Linux, Windows)...

Hi Martin,

Just a quick thought:

While it is not strictly a benchmark, perhaps a decent proxy would be to use the CUDACapability attribute that is likely already present in each slot with a GPU (assuming they are NVIDIA gpus, that is).

You could enter the following condor_status command to see if you feel that CUDACapability makes intuitive sense as a performance metric on your pool:

condor_status -cons 'gpus>0' -sort CUDACapability -af name CudaCapability CudaDevicename

Hope the above helps
Todd
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
 
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Mailing List Archives

Authenticated access

Re: [HTCondor-users] GPU benchmarking