[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] GPUsAverageUsage not set/wrong value

Date: Fri, 12 Jun 2026 09:21:29 -0500 (CDT)
From: Todd L Miller <tlmiller@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] GPUsAverageUsage not set/wrong value

We are seeing sometimes strange behavoir with the GPUsAverageUsage of a job.It ends up not beeing set and stays on "undefined" if queried. While queringthe slot it self that returns the usage value correct.

Interesting. If I recall correctly, HTCondor detects GPU usage bydevice, and then accumulates that usage in the slot using that device.The last step is to assign the slot's usage -- from when the job began --to the job. I can't presently imagine a reason for the last step to nothappen.

Could you describe the jobs for which you're seeing this problemin some detail? (Do they use more than one GPU? Are theycontainer-universe jobs? How long do they run for? Are they running onglide-ins or on EPs started with root privileges?)

Plus that it seems also that the GPUsAverageUsage value is sometimescompletly off to the actual expected usage value.


	In these cases, do the per-slot numbers look sane?

I have the feeling somehow the calculations for this value are not fullycorrect. Any ideas or pointers where to look to debug this behavior moreclearly?

Unfortunately, the implementation is rather more complicated thanhas proved to be worthwhile; I don't know that a copy of a representativejob ad would help, but it certainly wouldn't hurt.

It may also be instructive to check, in the job event log (ifany), what the report GPUs usage at the end of the run looks like.


-- ToddM

Follow-Ups:
- Re: [HTCondor-users] GPUsAverageUsage not set/wrong value
  - From: Emily Kooistra

References:
- [HTCondor-users] GPUsAverageUsage not set/wrong value
  - From: Emily Kooistra

Prev by Date: Re: [HTCondor-users] Ubuntu 22 aarch64
Next by Date: Re: [HTCondor-users] Incorrect hold code for input transfer failure?
Previous by thread: Re: [HTCondor-users] GPUsAverageUsage not set/wrong value
Next by thread: Re: [HTCondor-users] GPUsAverageUsage not set/wrong value
Index(es):
- Date
- Thread