I have a long running python script that queries the scheduler periodically and noticed the process continually growing in memory over time.
The following snippet is enough to illustrate the issue on our systems (running condor v8.4.6):
import resource
import htcondor
schedd = htcondor.Schedd()
while True:
schedd.query()
print('mem use: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)With no jobs running, I typically see the memory use increase by 128kb once every 45 or so iterations of the loop.
pympler didn't show any new python objects getting created, but when I hooked up valgrind's leak checker I see it report the following lost blocks:
==2473759== 136,000 (92,000 direct, 44,000 indirect) bytes in 500 blocks are definitely lost in loss record 820 of 821
==2473759== at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2473759== by 0x6B42B29: CondorQ::getFilterAndProcessAds(char const*, StringList&, int, bool (*)(void*, compat_classad::ClassAd*), void*, bool) (in /usr/lib/condor/libcondor_utils_8_4_6.so)
==2473759== by 0x6B43EB6: CondorQ::fetchQueueFromHostAndProcess(char const*, StringList&, int, int, bool (*)(void*, compat_classad::ClassAd*), void*, int, CondorError*) (in /usr/lib/condor/libcondor_utils_8_4_6.so)
==2473759== by 0x65750CA: Schedd::query(boost::python::api::object, boost::python::list, boost::python::api::object, int, CondorQ::QueryFetchOpts) (in /usr/lib/python2.7/dist-packages/htcondor.so)
==2473759== by 0x6575C4E: query_overloads::non_void_return_type::gen<boost::mpl::vector7<boost::python::api::object, Schedd&, boost::python::api::object, boost::python::list, boost::python::api::object, int, CondorQ::QueryFetchOpts> >::func_0(Schedd&) (in /usr/lib/python2.7/dist-packages/htcondor.so)
==2473759== by 0x656B0FA: boost::python::objects::caller_py_function_impl<boost::python::detail::caller<boost::python::api::object (*)(Schedd&), boost::python::default_call_policies, boost::mpl::vector2<boost::python::api::object, Schedd&> > >::operator()(_object*, _object*) (in /usr/lib/python2.7/dist-packages/htcondor.so)
==2473759== by 0x67ED9E9: boost::python::objects::function::call(_object*, _object*) const (in /usr/lib/condor/libpyclassad2.7_8_4_6.so)
==2473759== by 0x67EDD57: ??? (in /usr/lib/condor/libpyclassad2.7_8_4_6.so)
==2473759== by 0x67E8852: boost::python::handle_exception_impl(boost::function0<void>) (in /usr/lib/condor/libpyclassad2.7_8_4_6.so)
==2473759== by 0x67EC662: ??? (in /usr/lib/condor/libpyclassad2.7_8_4_6.so)
==2473759== by 0x499BE4: PyEval_EvalFrameEx (in /usr/bin/python2.7)
==2473759== by 0x4A1633: ??? (in /usr/bin/python2.7)
I can work around this issue by restarting the process periodically but it seems like there is an allocation in getFilterAndProcessAds that isn't later freed?
-Scott