Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor cgroup hard setting related queries

Date: Tue, 02 Mar 2021 10:40:32 -0600
From: Greg Thain <gthain@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] condor cgroup hard setting related queries

On 3/2/21 3:52 AM, ervikrant06@xxxxxxxxx wrote:

## Test with condor 8.8.5 (Stable release) and CentOS Linux release 7.9.2009

Test 1 & Test 2: In both cases below message reported in slot log file but job was infinitely in running state until manual action taken.Â

Spurious OOM event, usage is 2, slot size is 5399 megabytes, ignoring OOM (read 8 bytes)

Hi Vikram:

I believe there were some bugs in cgroup OOM handling in older condor versions.Â Can you try with the setting

IGNORE_LEAF_OOM = false?

-greg

Questions:

- If we really need to use SYSTEM_PERIODIC_HOLD with cgroup hard setting then what would be the right _expression_ for partitionable slots?
- Why is the centos7 node job either marked as completed or held despite breaching the mem limit like it did in rhel6 setup?
- Why in some cases results are completely empty for hold reason codes and in some they are returned successfully from the executor node?
- Is't okay to use WANT_HOLD and SYSTEM_PERIODIC_HOLD both together? We are currently using WANT_HOLD to hold jobs if they are running more than stipulated time.

[1] https://www-auth.cs.wisc.edu/lists/htcondor-users/2021-February/msg00123.shtml
[2] https://www-auth.cs.wisc.edu/lists/htcondor-users/2019-August/msg00064.shtml

Thanks & Regards,
Vikrant Aggarwal
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Follow-Ups:
- Re: [HTCondor-users] condor cgroup hard setting related queries
  - From: ervikrant06

References:
- Re: [HTCondor-users] condor cgroup hard setting related queries
  - From: ervikrant06

Prev by Date: Re: [HTCondor-users] CondorCE: BatchRuntime available for Condor as LRMS?
Next by Date: Re: [HTCondor-users] condor cgroup hard setting related queries
Previous by thread: Re: [HTCondor-users] condor cgroup hard setting related queries
Next by thread: Re: [HTCondor-users] condor cgroup hard setting related queries
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] condor cgroup hard setting related queries