On 10/12/2017 12:29 PM, Michael Di Domenico wrote:
On Thu, Oct 12, 2017 at 11:11 AM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:Assuming you are on Linux... One thing that changed between v8.4.x and v8.6.x is in v8.6.x cgroup support is enabled by default which allows HTCondor to more accurately track the amount of memory your job uses during its lifetime. On an execute node that put your job on hold, what does condor_config_val -dump CGROUP PREEMPT say? I am interested in values for CGROUP_MEMORY_LIMIT_POLICY and BASE_CGROUP (see the Manual for details on these knobs), or if your machines are configured to PREEMPT jobs that use more memory than provisioned in the slot. These settings could tell HTCondor to put jobs on hold that use more memory than allocated in the slot.yes this is linux, rhel 7.4 to be specific cgroup_preempt doesn't show up base_cgroup = htcondor cgroup_memory_limit_policy = none we actually set preempt to false everywhere. we don't want jobs to preempt for any reason unless a person specifically says to.
Hmmm... Well, given your cgroup_memory_limit_policy is none and preempt is false, my guess is that HTCondor is not killing the jobs because the jobs used more than the slot memory, but instead the operating system itself is killing these jobs because the system as a whole is running out of memory.
Are these execute nodes also running other services that consume memory besides HTCondor? For instance, here at UW, we have execute nodes setup with no swap space that are also running squid proxies, gluster clients, and cvmfs... all services that consume memory. By default, HTCondor assumes it can allocate all of the system memory into slots for use by jobs (not such a great default IMHO. But for instance, if a server with 16 GB is also running the gluster client service which is using 2GB, then obviously you could run into trouble if HTCondor also tries to start a 16GB job on the machine. Also be aware that HTCondor tells the system OOM killer to favor killing jobs over other processes if the system starts running out of memory (the idea here is we don't want the OOM to decide to kill the condor_startd instead of a job!).
So you may want to set the RESERVED_MEMORY knob in your condor config file on your execute machines (cut-n-pasted description from the manual below). Here at UW we have RESERVED_MEMORY=2048 or so.
RESERVED_MEMORYHow much memory (in MB) would you like reserved from HTCondor? By default, HTCondor considers all the physical memory of your machine as available to be used by HTCondor jobs. If RESERVED_MEMORY is defined, HTCondor subtracts it from the amount of memory it advertises as available.
Hope the above helps Todd