Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Advertised Memory of an execute point running in Apptainer.
- Date: Fri, 21 Mar 2025 16:35:00 +0100
- From: Matthias Schnepf <matthias.schnepf@xxxxxxx>
- Subject: [HTCondor-users] Advertised Memory of an execute point running in Apptainer.
Hi all,
We try to run an HTCondor execute point (master, startd) in an apptainer
container within a SLURM job.
HTCondor starts and is able to run jobs. However, the advertised memory
is always the memory of the host system or when I limit the memory via
apptainer (apptainer run --memory ....).
I tried setting the "Memory" class-ad in the config of the execute
point, but it has no effect.
I also tried it withMemory = "foo". The execute point still advertises
the memory of the host system and accepted jobs. The start of the job
failed because it tried to evaluate Memory="foo" and complained that it
is no integer.
Do I need to set something?
The Slurm WN is a Rockylinux 9.5. I tried HTCondor versions 23.10.18
and 24.0.5. The Slurm cluster sund the cgroup v2 plugin slurmstepd.
Could it be a problem with how the groups are set up?
The memory limit of the job itself is set to what the job requests:
cat /sys/fs/cgroup/system.slice/slurmstepd.scope/job_3679/memory.max
536870912000
But the process of HTCondor runs in a sub-cgroup where the limit is set
to max
cat
/sys/fs/cgroup/system.slice/slurmstepd.scope/job_3679/step_batch/user/task_0/memory.max
max
Best regards,
Matthias