Subject: Re: [HTCondor-users] Periodic Hold for jobs exceeding memory and CPU requests
On 2/24/2021 8:49 AM, David Cohen
wrote:
Hi Todd,
~$ condor_config_val BASE_CGROUP
htcondor
~$ condor_config_val CGROUP_MEMORY_LIMIT_POLICY
HARD
And still I recall at least two occasions when users were
running over the requested memory.
Hi David,
Note that CGROUP_MEMORY_LIMIT_POLICY does not hold jobs running over
the requested memory, it holds jobs that use more resident memory
than allocated in the execute slot. For instance, if a job requests
6 GB and is matched to a slot containing 12 GB of memory, the job
will not be halted unless the resident set size of all processes
running on that slot exceed 12GB.
By default, HTCondor will match where the jobs requested_memory
< the slot's memory. The Requested Memory from a job will not
always be exactly equal to the Slot's Memory. Reasons they may be
different include use of static slots, or use of partitionable slots
with due to a) config setting MODIFY_REQUEST_EXPR_REQUESTMEMORY
which will round-upwards the memory of the slot so it matches more
jobs in the future (see https://tinyurl.com/ydev6mka) and/or b) slot
preemption, where a dynamic slot is created with 20GB for a job
requesting 20GB, but then that slot is preempted for use by a higher
priority user for a job that requested less than 20GB.