Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Condor to manage GPUs only
- Date: Wed, 26 Jul 2023 20:34:39 +0000
- From: Russell Smithies <Russell.Smithies@xxxxxxxxxx>
- Subject: [HTCondor-users] Condor to manage GPUs only
Hi all,
I used Condor 20 years ago and am trying to transition back from slurm.
I want to initially only use Condor for managing the GPUs on 3 servers, two servers have 2 x A100s and one server has 2 X V100.
I'm not sure of the best way to do this - or if it's even possible? Surely given the number of products that are "powered by GPUs" it must be.
When I do a "condor_gpu_discovery" I can see the GPUs:
muthur# /usr/libexec/condor/condor_gpu_discovery -extra -nested
DetectedGPUs="GPU-5f846c33, GPU-c60861f1"
Common=[ Capability=8.0; ClockMhz=1410.00; ComputeUnits=108; CoresPerCU=64; DeviceName="NVIDIA A100 80GB PCIe"; DriverVersion=12.20; ECCEnabled=true; GlobalMemoryMb=81051; MaxSupportedVersion=12020; ]
GPU_5f846c33=[ id="GPU-5f846c33"; DevicePciBusId="0000:41:00.0"; DeviceUuid="5f846c33-4dd5-ad62-eb12-c3813915d819"; ]
GPU_c60861f1=[ id="GPU-c60861f1"; DevicePciBusId="0000:A1:00.0"; DeviceUuid="c60861f1-85ee-082a-6211-8564787ede57"; ]
But when I do "condor_status" I don't see the GPUs but only see the CPU resources. And on this server with a pair of AMD EPYC 75F3 processors that's 128 slots to scroll through.
What I really want to see is no CPU slots, only the GPUs.
Is this possible or am I asking too much.
Is there a better way of job scheduling for GPUs?
Thanx,
Russell Smithies