Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Translating GPU device assignments?
- Date: Thu, 06 Jul 2017 21:03:59 +0000
- From: John M Knoeller <johnkn@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Translating GPU device assignments?
GPU_DEVICE_ORDINAL is the equivalent of CUDA_VISIBLE_DEVICES for OpenCL, It would be incorrect for us to renumber it.
it sounds like you are saying that the job shouldn't look at CUDA_VISIBLE_DEVICES at all, it should just look at the number of GPUs it has been assigned
and then start from 0.
-tj
-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Michael Pelletier
Sent: Thursday, July 6, 2017 10:04 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Translating GPU device assignments?
A little bit of follow-up as I worked on this over the long weekend.
[Michael Pelletier]
So it turns out that the CUDA_VISIBLE_DEVICES=2,3 environment variable prompts the CUDA library to renumber the GPU ordinals for those devices to 0,1.
Thus in order to get the correct ordinals, you can't just use CUDA_VISIBLE_DEVICES or GPU_DEVICE_ORDINAL.
So it seems that the GPU_DEVICE_ORDINAL variable is being set incorrectly - when used in combination with CUDA_VISIBLE_DEVICES, it should be set to 0 through however many GPUs are requested.
I've worked around via:
GPU_ORDINAL = $CHOICE(REQGPU_INT, "error", "0", "0,1", "0,1,2", \
"0,1,2,3", "0,1,2,3,4", "0,1,2,3,4,5", "0,1,2,3,4,5,6", \
"0,1,2,3,4,5,6,7", "0,1,2,3,4,5,6,7,8", "too_many_gpus_requested")
And as I mentioned before, it'd be great to have this as a job attribute as well.
-Michael Pelletier.
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/