[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job log not reporting Memory usage



I'm very sorry to hear that, as our users are suffering from both lack of memory reporting, and jobs unexpectedly going over their memory limit.
It is to the point they find it hard to get any work done, and I'm getting all the complaints and user's frustration.
When can we expect the cgroups v2 issues to be fixed?





On Wed, Sep 4, 2024 at 12:10âAM Tim Theisen <tim@xxxxxxxxxxx> wrote:

Hello David,

I am sorry to report that the backport is not feasible. It depends on many of the changes to for cgroups v2.

There is no good workaround other than upgrading to a newer version.

...Tim

On 8/27/24 12:18, Tim Theisen wrote:

Hello David,

I have backported Greg fix for the upcoming 23.0.15 release.

...Tim

On 8/21/24 07:57, David Cohen wrote:
Hi Greg,
As I don't see the resolution planned for the 23.0.15 release, is there a workaround I can implement in our system?

-David

On Wed, Aug 7, 2024 at 5:03âPM Greg Thain via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
On 8/6/24 22:27, David Cohen wrote:

I think there is something wrong with how the cluster is tracking memory usage. I had a batch of jobs out of which many were held with the error "Job Is Wasting Memory using less than 20 percent of requested Memory". But I am also printingÂmemory usage from within the job, and I see it was at least at certain points during the simulation using more than half of the memory I requested.


Hi David:

I definitely see there are cases where HTCondor is not correctly reporting memory usage, and I have a fix that will be going into the next 23.10 release, and will backport it to 23.0.x. Sorry for the inconveniences.


-greg


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
-- 
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736
-- 
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736