Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Periodic Hold for jobs exceeding memory and CPU requests

Date: Wed, 24 Feb 2021 14:10:29 -0600
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Periodic Hold for jobs exceeding memory and CPU requests

On 2/24/2021 8:49 AM, David Cohen wrote:

Hi Todd,
~$ condor_config_val BASE_CGROUP
htcondor
~$ condor_config_val CGROUP_MEMORY_LIMIT_POLICY
HARD

And still I recall at least two occasions when users were running over the requested memory.

Hi David,

Note that CGROUP_MEMORY_LIMIT_POLICY does not hold jobs running over the requested memory, it holds jobs that use more resident memory than allocated in the execute slot. For instance, if a job requests 6 GB and is matched to a slot containing 12 GB of memory, the job will not be halted unless the resident set size of all processes running on that slot exceed 12GB.

By default, HTCondor will match where the jobs requested_memory < the slot's memory. The Requested Memory from a job will not always be exactly equal to the Slot's Memory. Reasons they may be different include use of static slots, or use of partitionable slots with due to a) config setting MODIFY_REQUEST_EXPR_REQUESTMEMORY which will round-upwards the memory of the slot so it matches more jobs in the future (see https://tinyurl.com/ydev6mka) and/or b) slot preemption, where a dynamic slot is created with 20GB for a job requesting 20GB, but then that slot is preempted for use by a higher priority user for a job that requested less than 20GB.

Hope the above helps,
Todd

References:
- [HTCondor-users] Periodic Hold for jobs exceeding memory and CPU requests
  - From: David Cohen
- Re: [HTCondor-users] Periodic Hold for jobs exceeding memory and CPU requests
  - From: Todd Tannenbaum
- Re: [HTCondor-users] Periodic Hold for jobs exceeding memory and CPU requests
  - From: David Cohen

Prev by Date: Re: [HTCondor-users] if/endif conditionals in new/old style job transforms/routes
Next by Date: Re: [HTCondor-users] slow creation of condor_shadow processes
Previous by thread: Re: [HTCondor-users] Periodic Hold for jobs exceeding memory and CPU requests
Next by thread: [HTCondor-users] ADD_WINDOWS_FIREWALL default has also changed from True (8.6.12) to $(CondorIsAdmin)=False (8.8.12)
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Periodic Hold for jobs exceeding memory and CPU requests