Hello Experts,
Need some help to understandÂhow GPUsMemoryUsage and GPUsAverageUsage are calculated.Â
I have a machine with 10GPUs partitioned into multiple dynamic slots. Two running jobs on this machine are using 3 GPUSs from slot4. I understand metrics started with GPUs* are advertised in job definition as they are per job metrics, DeviceGPUs* metrics are not aware of job.Â
Version: 9.0.17
I read about the improvements in new versions, not sure whether these improvements are related to my query or not.Â
Questions:
- Why the value ofÂGPUsAverageUsage is 1.16 for slot4, slot4 itself doesn't run any job? It's also not a combination of GPUsAverageUsage on slot4_1 and slot4_2.Â
- WhyÂGPUsMemoryUsage is equivalent toÂDeviceGPUsMemoryPeakUsage for the slots which are not in use, it could be undefined likeÂGPUsAverageUsage?
- What'sÂUptimeGPUsSecondsAverageUsage couldn't find any information about this parameter?Â
Thanks & Regards,
Vikrant Aggarwal