On Jul 26, 2023, at 9:56 PM, Russell Smithies <Russell.Smithies@xxxxxxxxxx> wrote:
My
next issue is sorting out munge authentication if anyone can point me to some useful docs? I can't get it to use anything but the default tokens ;-(
We've
used munge on slurm so I don't see any great need to change.
--Russell
-----Original
Message-----
From:
HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of John M Knoeller via HTCondor-users
Sent:
Thursday, July 27, 2023 1:08 PM
To:
HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc:
John M Knoeller <johnkn@xxxxxxxxxxx>
Subject:
Re: [HTCondor-users] Condor to manage GPUs only
Add
use
FEATURE : GPUs
to
the configuration of your STARTD to have it run condor_gpu_detection on startup and treat the GPUs as slot resources.
-tj
-----Original
Message-----
From:
HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Russell Smithies
Sent:
Wednesday, July 26, 2023 3:35 PM
To:
htcondor-users@xxxxxxxxxxx
Subject:
[HTCondor-users] Condor to manage GPUs only
Hi
all,
I
used Condor 20 years ago and am trying to transition back from slurm.
I
want to initially only use Condor for managing the GPUs on 3 servers, two servers have 2 x A100s and one server has 2 X V100.
I'm
not sure of the best way to do this - or if it's even possible? Surely given the number of products that are "powered by GPUs" it must be.
When
I do a "condor_gpu_discovery" I can see the GPUs:
muthur#
/usr/libexec/condor/condor_gpu_discovery -extra -nested
DetectedGPUs="GPU-5f846c33,
GPU-c60861f1"
Common=[
Capability=8.0; ClockMhz=1410.00; ComputeUnits=108; CoresPerCU=64; DeviceName="NVIDIA A100 80GB PCIe"; DriverVersion=12.20; ECCEnabled=true; GlobalMemoryMb=81051; MaxSupportedVersion=12020; ]
GPU_5f846c33=[
id="GPU-5f846c33"; DevicePciBusId="0000:41:00.0"; DeviceUuid="5f846c33-4dd5-ad62-eb12-c3813915d819"; ]
GPU_c60861f1=[
id="GPU-c60861f1"; DevicePciBusId="0000:A1:00.0"; DeviceUuid="c60861f1-85ee-082a-6211-8564787ede57"; ]
But
when I do "condor_status" I don't see the GPUs but only see the CPU resources. And on this server with a pair of AMD EPYC 75F3 processors that's 128 slots to scroll through.
What
I really want to see is no CPU slots, only the GPUs.
Is
this possible or am I asking too much.
Is
there a better way of job scheduling for GPUs?
Thanx,
Russell
Smithies
_______________________________________________
HTCondor-users
mailing list
To
unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject:
Unsubscribe
You
can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The
archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users
mailing list
To
unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject:
Unsubscribe
You
can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The
archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/