Great find Jaime, the difference is quite dramatic!
http://i.imgur.com/M0tmFwb.png http://i.imgur.com/TPzBeU9.png
However, I had to configure TOOL_LOG to write to /dev/null, even unset in the condor config it seemed to be trying to write to a default log
/var/log/condor/ToolLog. I found that I can just set htcondor.param['TOOL_LOG']='/dev/null' in the script instead of changing the system config.
Thanks again for the hard work!
Kevin
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Jaime Frey <jfrey@xxxxxxxxxxx>
Sent: Tuesday, April 18, 2017 1:09 PM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] Memory leak in python binding issue still occurring (Ticket #5727)
I’ve identified the source of the memory usage. It’s HTCondor’s debug logging system. It buffers up all logging in memory until the program code does a call saying where, if anywhere, the data should be written. The python bindings don’t perform
this call automatically. That may be ok for a short-lived program, but not for something that will run for a long period of time. The bindings do expose the logging configuration function. You can eliminate the memory leak by calling htcondor.enable_log()
from python. After this call, if TOOL_LOG is set in the HTCondor configuration file, then the log messages will be written there. If TOOL_LOG isn’t set, then they are discarded.
We’ll work on a fix to eliminate the increased memory usage without having to make any changes to your python code.
- Jaime
Thank you Jaime!
-Jitendra
I am investigating this issue. I can reproduce it on my machine. Once I know a little more about the problem, I’ll open a new ticket.
Can we reopen the issue again, so that it gets worked on and fixed in upcoming version?
After looking into it a bit, looks the the schedd.query() calls aren't the source of our leaks now. It appears to be coming from the Schedd() initialization and/or collector.locateAll() call. I discovered
one of our test probes wasn't actually doing anything due to a authentication issue; it would just try to obtain daemon stats via locateAll() several times, then return with an error. Memory usage went up about 10 KB per call.
From: Kevin
Retzke
Sent: Friday, April 7, 2017 10:55 AM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] Memory leak in python binding issue still occurring (Ticket #5727)
The memory usage trends for a number of our long-running fifemon probes also seem to suggest a leak still exists (8.6.1).
Kevin
I just tested again with both schedd.query and schedd.xquery (tested on version 8.6.1 and 8.4.11). In both case the memory is increase by 128kb, exactly same behavior
as mentioned in original ticket.
schedd.query and schedd.xquery have completely different implementation - so there may be a leak with xquery, but it’s not the SAME leak.
But found out that memory leak is still occurring. I have verified that this issue is still occurring on latest stable version 8.6.1 and also on 8.4.11.
I used the sample code provided in the ticket to verify the issue:
schedd = htcondor.Schedd()
# explicitly run the garbage collector to free up mem not leaked
print('mem use: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
Can someone please help if I’m missing something or confirm that fix is not available in the latest stable version?
Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project
|