[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] memory leak in htcondor2 job event log?



Hi,

I switched from htcondor to htcondor2 two weeks ago, and have noticed significantly higher memory usage (orders of magnitude) when using job event logs, which always increases until I restart my python script.

So I wrote a test script [1] and ran it with both the old htcondor pybinding and the new htcondor2 pybinding. The old version maintains a consistent value in memory, while the htcondor2 version always increases in memory.

Can someone from the HTCondor team take a look at the job event log code and see if there's a bug in there leaking memory?

Best,
David


[1] test_job_event_log.py
import tracemalloc
#import htcondor
import htcondor2 as htcondor

def get_memory_from_proc_statm():
  with open('/proc/self/statm') as f:
    # The second value is the Resident Set Size (RSS) in pages
    pages = int(f.read().split()[1])
    page_size = 4096
    rss_bytes = pages * page_size
    return rss_bytes

def mem_report():
 current, _ = tracemalloc.get_traced_memory()
 print(f"Python memory usage: {current / 1024:.2f} KB")
 print(f"Proc memory usage: {get_memory_from_proc_statm() / 1024:.2f} KB")

# use a job event log that already exists somewhere (this one was 100MB)
jel_filename = 'submit/2025-12-19T13/jobs.jel'

def run():
 jel = htcondor.JobEventLog(jel_filename)
 events = jel.events(0)

 print('start')
 mem_report()

 while True:
  i = 0
  for i,_ in enumerate(events):
   if i >= 10000:
    break
  print(f'loaded {i} events')
  mem_report()
  if i == 0:
   break

 jel.close()
 print('close log')
 mem_report()

 print('end')
 mem_report()

tracemalloc.start()
run()
tracemalloc.stop()