Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] CUDA_VISIBLE_DEVICES not in the environment
- Date: Thu, 12 Dec 2019 15:13:01 +0000
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] CUDA_VISIBLE_DEVICES not in the environment
On 12/12/2019 6:58 AM, Beyer, Christoph wrote:
> Hi,
>
> I am struggling a bit with the parallel usage of GPUs as I mentioned earlier. As a matter of fact part of my problems result from
> CUDA_VISIBLE_DEVICES not being set in the job environment
>
> I use the gpu-feature which expands as expected to:
>
> [root@batchg010 condor]# condor_config_val use feature:gpus
> use FEATURE:GPUs is
> MACHINE_RESOURCE_INVENTORY_GPUs=$(LIBEXEC)/condor_gpu_discovery -properties $(GPU_DISCOVERY_EXTRA)
> ENVIRONMENT_FOR_AssignedGPUs=GPU_DEVICE_ORDINAL=/(CUDA|OCL)// CUDA_VISIBLE_DEVICES
> ENVIRONMENT_VALUE_FOR_UnAssignedGPUs=10000
>
> I am running a jobwrapper but also in the jobwrapper environment I do not see a sign of CUDA_VISIBLE_DEVICES being set, same thing in the environment once the job is running.
>
> Subsequently I get all 4 GPUs in a single gpu-slot:
>
> /usr/libexec/condor/condor_gpu_discovery
> DetectedGPUs="CUDA0, CUDA1, CUDA2, CUDA3"
>
>
> Is there an additional trick that I missed ?
>
> This on
>
> $CondorVersion: 8.9.1 Apr 17 2019 BuildID: 466671 PackageID: 8.9.1-1 $
>
>
>
>
Strange....
If your startd just has one partitionable slot, or your startd just has
static slots, all you should need to do is add "use feature:gpus" and
everything should be good including setting CUDA_VISIBLE_DEVICES
appropriately. This covers vast majority of the setups we see.
Does your startd config have a combination of both static and
partitionable slots, and/or does it have more than one partitionable
slot? In that case perhaps additional tricks are needed...
regards
Todd