[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Bug: cgroup limits not enforced with Singularity containers and condor_ssh_to_job / interctive jobs



Am 03.10.22 um 20:31 schrieb Greg Thain via HTCondor-users:
On 9/27/22 07:01, Oliver Freyermuth wrote:
Dear HTCondor devs,



Subsequently, limit enforcement (CPUs, Memory) does not take place, neither for interactive jobs nor for processes spawned after using "condor_ssh_to_job".

Ideas for good workarounds (or of course a fix) welcome ;-).


Hi Oliver:

I see what's going on in the code, and we'll work on a fix.

Hi Greg,

many thanks, that's very appreciated :-). I'm still astonished it took our users so long to actually exceed their requests for interactive jobs significantly,
that's why it took me so long to realize.

I see also [0] was resolved via HTCONDOR-1354 for 10.0.0 now, that's also great news to enforce disk usage limits at some later point.


One issue which has crept up on me again is a race in interactive container jobs in the starter here:
 https://github.com/htcondor/htcondor/blob/5e1c909f59372e029e3c6019f57c4688737e3b2f/src/condor_starter.V6.1/os_proc.cpp#L1190-L1195
OUr users sometimes manage to hit the "hope for the best" branch there (i.e. using singularity itself as PID to attach to) and then end up in wrong namespaces (e.g. wrong mount namespace)
if the filesystem on which the container is located is slow and "condor_submit -interactive" is fast to execute "condor_ssh_to_job".

I'll see if I can squeeze in some development time on this, probably the best-effort approach is to delay and retry in case there is no child of the Singularity process (yet) â
if I manage, I can contribute a PR :-).

Cheers from Bonn (and indeed hope I can join in person sometime soon),
	Oliver

[0] https://lists.cs.wisc.edu/archive/htcondor-users/2021-August/msg00132.shtml


We will miss you this year in Italy, but hope that you can join us in person soon!


-greg


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

--
Oliver Freyermuth
UniversitÃt Bonn
Physikalisches Institut, Raum 1.047
NuÃallee 12
53115 Bonn
--
Tel.: +49 228 73 2367
Fax:  +49 228 73 7869
--

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature