Hi all, we are observing since yesterday a suddenly increasing memory usage on one of our condor schedulers [1]. Several condor schedds processes start allocating each ~35% of memory (16gB overall). Judging from the (coarse grained (1m)) monitoring, memory usage is oscillating with ~5m between 50% and >100% and swapping. After restarting the service/node, memory consumption got down significantly but peaked again later during the day. So far we have not found a cause for the sudden hunger for memory. On Feb. 01 we upgraded the node from 8.4.1 to 8.6.1 and kept its sibling node on 8.4.1. So far, the 8.4.1 sibling has not shown the behaviour. On the other hand it took the 8.6.1 node more than week to get into the current state, so it is imho not very conclusive to suggest the update as cause? [2] Maybe somebody has an idea, where to look for clues? Cheers and thanks for ideas, Thomas [1] grid-arcce1.desy.de - Nordugrid ARC CE submitting to Condor > free total used free shared buffers cached Mem: 16268636 8605212 7663424 308 423620 551716 -/+ buffers/cache: 7629876 8638760 Swap: 4095996 1308184 2787812 [2] in the release notes I found nothing matching that could point to our observations http://research.cs.wisc.edu/htcondor/manual/v8.6/10_3Stable_Release.html
Attachment:
grid-arcce1_mem_3d_201702091735.png
Description: PNG image
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature