Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Adding GPUs to machine resources
- Date: Wed, 16 Apr 2014 15:25:55 +0200
- From: Steffen Grunewald <Steffen.Grunewald@xxxxxxxxxx>
- Subject: Re: [HTCondor-users] Adding GPUs to machine resources
On Wed, Apr 16, 2014 at 02:41:59PM +0200, Steffen Grunewald wrote:
> But: If the user "forgets" to specify request_gpus (or sets it to 0),
> then CUDA_VISIBLE_DEVICES isn't set *which apparently leaves full access
> to _all_ GPU resources of the machine*. Is this intended? I'd expect
> something like CUDA_VISIBLE_DEVICES=-1 ...
(see below)
> Still running 8.1.4
Here's another quirk (at least I think it's one), from the output
(printenv, machine ad, ...) of a job scheduled to the second of two
GPUs:
$ grep CUDA 3.out
_CONDOR_AssignedGPUS=CUDA1
CUDA_VISIBLE_DEVICES=1
CUDARuntimeVersion = 5.5
CUDAGlobalMemoryMb = 4800
CUDACapability = 3.5
CUDAECCEnabled = false
CUDADriverVersion = 6.0
CUDADeviceName = "Tesla K20c"
AssignedGPUS = "CUDA1"
As $CUDA_VISIBLE_DEVICES equals 1, _only_ the second GPU would be visible,
as can be proved as follows:
$ CUDA_VISIBLE_DEVICES=1 /usr/lib/condor/libexec/condor_gpu_discovery
DetectedGPUs="CUDA0"
Note that in the CUDA_VISIBLE_DEVICES context, the device "name" is different
from what's announced in the machine ad.
$ CUDA_VISIBLE_DEVICES=0 /usr/lib/condor/libexec/condor_gpu_discovery
DetectedGPUs="CUDA0"
- same result, different GPU.
Maybe I'm misinterpreting stuff?
BTW,
$ CUDA_VISIBLE_DEVICES=-1 /usr/lib/condor/libexec/condor_gpu_discovery
DetectedGPUs=0
$ CUDA_VISIBLE_DEVICES="" /usr/lib/condor/libexec/condor_gpu_discovery
DetectedGPUs=0
but (behaviour in the case of request_gpus=0)
$ unset CUDA_VISIBLE_DEVICES; /usr/lib/condor/libexec/condor_gpu_discovery
DetectedGPUs="CUDA0, CUDA1"
Moreover, $$(AssignedGPUS) in the "arguments" apparently isn't replaced by a CUDA*
string, as suggested by the HowToManageGPUs wiki page...
- S