Subject: [HTCondor-users] Memory accounting issue with cgroups
The following issue started occurring at one of the 10.x releases
(not certain which, but still present in 10.4.3), installed from
.debs on nodes running Ubuntu 22.04.
My config used to have "CGROUP_MEMORY_LIMIT_POLICY = hard" and "use
POLICY : Hold_If_Memory_Exceeded". Jobs were correctly put on hold
when they exceeded their request_memory.
Now, with the same config and the same jobs, they eventually all go
on Hold with "memory usage exceeded request_memory", while their
actual consumption (USS, PSS and RSS as reported by smem) never
exceed request_memory.
Their job log shows image size updates every 5 minutes, with RSS
steadily increasing by 1GB/5mn. Once this exceeds request_mem, they
(correctly) go into hold - except that their actual RSS never went
beyond 2GB. When I remove the 'use POLICY' config, the jobs continue
unbounded.
Looking in the cgroup context of the job's (dynamic) slot, it seems
that Condor takes 'memory.current' to be the RSS. This would be
correct if the job were under (severe) memory pressure, but (and
this seems to be the crux of the issue) both 'memory.high' and
'memory.max' are set to "max" (and the machine has loads of memory).
The Condor docs suggest that memory.high and memory.max should be at
90% and 100% of request_memory.
In fact, when I "cat memory.current | sudo tee memory.high", then
memory.current and the RSS reported by Condor stay at that same
level throughout, which presumably is precisely how this was
supposed to work. (Very elegant mechanism!)
Not sure where to look for diagnostics, but I see one ominous
message in the slot's StarterLog: "Error while locating memcg
controller for starter: 50014 Cgroup not initialized". This is the
tail of that log:
 05/18/23 09:50:58 (pid:1120439) Starting a
VANILLA universe job with ID: 4574.0
 05/18/23 09:50:58 (pid:1120439) Checking to see if htcondor is a
writeable cgroup
 05/18/23 09:50:58 (pid:1120439) Cgroup /htcondor is useable
 05/18/23 09:50:58 (pid:1120439) Current mount, /tmp, is shared.
 05/18/23 09:50:58 (pid:1120439) Current mount, /, is shared.
 05/18/23 09:50:58 (pid:1120439) IWD:
/var/lib/condor/execute/dir_1120439
 05/18/23 09:50:58 (pid:1120439) Output file:
/var/lib/condor/execute/dir_1120439/_condor_stdout
 05/18/23 09:50:58 (pid:1120439) Error file:
/var/lib/condor/execute/dir_1120439/_condor_stderr
 05/18/23 09:50:58 (pid:1120439) Renice expr "0" evaluated to 0
 05/18/23 09:50:58 (pid:1120439) Running job as user zwets
 05/18/23 09:50:58 (pid:1120439) About to exec [... omitted ...]
 05/18/23 09:50:58 (pid:1120439) Create_Process succeeded,
pid=1120441
 05/18/23 09:50:58 (pid:1120439) Error while locating memcg
controller for starter: 50014 Cgroup not initialized
 05/18/23 09:51:06 (pid:1120439) Failed to open '.update.ad' to
read update ad: No such file or directory (2).
 05/18/23 09:51:06 (pid:1120439) Failed to open '.update.ad' to
read update ad: No such file or directory (2).
Any suggestions on where to look or what could be the issue here?