## Test with condor 8.8.5 (Stable release) and CentOS Linux release 7.9.2009
Test 1 & Test 2: In both cases below message reported in slot log file but job was infinitely in running state until manual action taken.Â
Spurious OOM event, usage is 2, slot size is 5399 megabytes, ignoring OOM (read 8 bytes)
Hi Vikram:
I believe there were some bugs in cgroup OOM handling in older condor versions. Can you try with the setting
IGNORE_LEAF_OOM = false?
-greg
Questions:
- If we really need to use SYSTEM_PERIODIC_HOLD with cgroup hard setting then what would be the right _expression_ for partitionable slots?
- Why is the centos7 node job either marked as completed or held despite breaching the mem limit like it did in rhel6 setup?
- Why in some cases results are completely empty for hold reason codes and in some they are returned successfully from the executor node?
- Is't okay to use WANT_HOLD and SYSTEM_PERIODIC_HOLD both together? We are currently using WANT_HOLD to hold jobs if they are running more than stipulated time.
[1] https://www-auth.cs.wisc.edu/lists/htcondor-users/2021-February/msg00123.shtml
[2] https://www-auth.cs.wisc.edu/lists/htcondor-users/2019-August/msg00064.shtml
Thanks & Regards,
Vikrant Aggarwal
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/