Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] CUDA_VISIBLE_DEVICES not in the environment
- Date: Thu, 12 Dec 2019 20:06:16 +0100 (CET)
- From: "Beyer, Christoph" <christoph.beyer@xxxxxxx>
- Subject: Re: [HTCondor-users] CUDA_VISIBLE_DEVICES not in the environment
Hi Tj,
it is all undefined:
[root@bird-htc-sched13 ~]# condor_status batchg010.desy.de -af AssignedGPUs
undefined
undefined
undefined
undefined
undefined
undefined
undefined
undefined
undefined
undefined
undefined
undefined
undefined
There is one partitionable slot with 4 gpus and a couple of static jupyter slots to make some usage of the cpu power of the machine:
[root@bird-htc-sched13 ~]# condor_status batchg010.desy.de
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
slot1@xxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 342351 1+05:25:05
slot1_2@xxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy 1.150 1536 0+02:12:03
slot1_3@xxxxxxxxxxxxxxxxx LINUX X86_64 Claimed Busy 1.160 1536 0+01:56:02
slot2@xxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4000 1+05:25:15
slot3@xxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4000 1+05:25:15
slot4@xxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4000 1+05:25:15
slot5@xxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4000 1+05:25:15
slot6@xxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4000 1+05:25:15
slot7@xxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4000 1+05:25:15
slot8@xxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4000 1+05:25:15
slot9@xxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4000 1+05:25:15
slot10@xxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4000 1+05:25:15
slot11@xxxxxxxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 4000 1+05:25:15
Machines Owner Claimed Unclaimed Matched Preempting Drain
X86_64/LINUX 13 0 2 11 0 0 0
Total 13 0 2 11 0 0 0
Best
christoph
--
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
----- UrsprÃngliche Mail -----
Von: "johnkn" <johnkn@xxxxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Donnerstag, 12. Dezember 2019 17:20:03
Betreff: Re: [HTCondor-users] CUDA_VISIBLE_DEVICES not in the environment
What GPUs are getting assigned to the slot?
condor_status -af Name AssignedGPUs
Does CUDA_VISIABLE_DEVICES get set in the environment when you don't use the job wrapper?
-tj
-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Beyer, Christoph
Sent: Thursday, December 12, 2019 6:58 AM
To: htcondor-users <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] CUDA_VISIBLE_DEVICES not in the environment
Hi,
I am struggling a bit with the parallel usage of GPUs as I mentioned earlier. As a matter of fact part of my problems result from
CUDA_VISIBLE_DEVICES not being set in the job environment
I use the gpu-feature which expands as expected to:
[root@batchg010 condor]# condor_config_val use feature:gpus
use FEATURE:GPUs is
MACHINE_RESOURCE_INVENTORY_GPUs=$(LIBEXEC)/condor_gpu_discovery -properties $(GPU_DISCOVERY_EXTRA)
ENVIRONMENT_FOR_AssignedGPUs=GPU_DEVICE_ORDINAL=/(CUDA|OCL)// CUDA_VISIBLE_DEVICES
ENVIRONMENT_VALUE_FOR_UnAssignedGPUs=10000
I am running a jobwrapper but also in the jobwrapper environment I do not see a sign of CUDA_VISIBLE_DEVICES being set, same thing in the environment once the job is running.
Subsequently I get all 4 GPUs in a single gpu-slot:
/usr/libexec/condor/condor_gpu_discovery
DetectedGPUs="CUDA0, CUDA1, CUDA2, CUDA3"
Is there an additional trick that I missed ?
This on
$CondorVersion: 8.9.1 Apr 17 2019 BuildID: 466671 PackageID: 8.9.1-1 $
--
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/