Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Limiting memory used on the worker node with c-groups
- Date: Fri, 24 Apr 2020 08:43:55 +0200
- From: Jean-Michel Barbet <jean-michel.barbet@xxxxxxxxxxxxxxxxx>
- Subject: [HTCondor-users] Limiting memory used on the worker node with c-groups
Hello,
Having had many times worker nodes hanging because of memory exhaustion,
I am trying to figure out how we can prevent this. I believe the memory
exhaustion is due to some kind of pathologic job using way more memory
than it should.
The first question would be : does it make sense to use
SYSTEM_PERIODIC_REMOVE in the config of a worker node (startd) or is it
working only on the scheduler (thus reacting with a certain delay) ?
Then, I tried differents settings of CGROUP_MEMORY_LIMIT_POLICY.
I understand that the default setting is : "none". In this case, in
/sys/fs/cgroup/memory/htcondor/condor_dlocal_htcondor_slot1\@worker,
"memory.limit_in_bytes" is set to the nodes detected memory divided by
the number of cores and "memory.soft_limit_in_bytes" is 0.
I tried setting CGROUP_MEMORY_LIMIT_POLICY to "soft". It seems to do its
job with jobs being remove with "Job has gone over memory limit of 6000
megabytes. Peak usage: 5926 megabytes." BUT: The result on the worker
nodes is a number of processes in "Deffered" status which gives a high
Unix load even if there is no CPU consumed. No new jobs are scheduled.
Looks like the jobs are not killed cleanly.
I am now trying with "hard". Let's see...
I have read this presentation :
https://research.cs.wisc.edu/htcondor/HTCondorWeek2017/presentations/WedDownes_cgroups.pdf
... but I do not understand everything. Sorry.
This is HTCondor version 8.6.13. Also, please note that I have made
is so that the threshold is higher than the detected memory :
MEMORY = 1.5 * quantize( $(DETECTED_MEMORY), 1000 )
MODIFY_REQUEST_EXPR_REQUESTMEMORY = quantize(RequestMemory,100)
Thank you in advance.
JM
--
------------------------------------------------------------------------
Jean-michel BARBET | Tel: +33 (0)2 51 85 84 86
Laboratoire SUBATECH Nantes France | Fax: +33 (0)2 51 85 84 79
CNRS-IN2P3/Ecole des Mines/Universite | E-Mail: barbet@xxxxxxxxxxxxxxxxx
------------------------------------------------------------------------