Am 27.02.19 um 10:14 schrieb Steffen Grunewald:
On Wed, 2019-02-27 at 09:57:12 +0100, Oliver Freyermuth wrote:Good morning, Am 27.02.19 um 09:43 schrieb Steffen Grunewald:- The argument "-a" to nsenter not being present on CentOS 7also not in Debian Stretch (and now that I check it with Jessie, it hasn't been there too - why dod nobody notice?)It's present in Ubuntu 18.04 LTS, maybe all container users (apart from us) have that on their servers, or have not upgraded yet.I've still got to find a man page that lists -a as a valid option; https://www.systutorials.com/docs/linux/man/1-nsenter/ doesn't have it.
This one has it: http://manpages.ubuntu.com/manpages/bionic/man1/nsenter.1.html Here's the commit which added that feature to nsenter: https://github.com/karelzak/util-linux/commit/974cc006f122f36e2187cedb9d3e58dc2d24814c
The problem was introduced with 8.8.0, as I could find by comparing src/condor_starter.V6.1/os_proc.cpp of 8.6 and 8.8 versions. Since some of the biggest Condor users refuse to run x.y.0 this wasn't discover earlier...
The switch to nsenter in general is something I strongly congratulate on (and actually asked for). Without nsenter, the only way to have interactive jobs at all was to run sshd inside the container - with all the problems this causes (SELinux, you need setuid root for Singularity due to permission issues when not all uids are mapped [ https://bugzilla.mindrot.org/show_bug.cgi?id=2813 ], dirty bind mounts are needed). So nsenter is definitely the way to go, but I had hoped it was tested better beforehand. But I'm positivie it will start to work in one of the next versions if we provide sufficient input ;-).
I'm lacking a test case at the moment, but I'm fearing the worst. Still nobody has attempted to run containers, it seems (or they failed and failed to report it?)I think Greg is correct and "-U" only fails with setuid root Singularity (the code "only" affects Singularity users in any case). Probably "-a" would do the correct thing, since Singularity with setuid root does not create a new user namespace so there's no need to attach.According to both the man page referred to above, and the Debian one, replacing "-a" with "-m -u -i -n -p -U" should do the trick (this leaves to be discussed whether all of those are needed - but if "-a" is supposed to work in the given context, then its "full expansion" should also do).
The man page does not match the code, apparently. The diff I linked shows that for "-U" a special handling is done, and it is dropped if the to-be-attached-to-PID is in the very same user namespace.
Here's the diff: condor-8.8.1# diff -u src/condor_starter.V6.1/os_proc.cpp{.ORIG,} --- src/condor_starter.V6.1/os_proc.cpp.ORIG 2019-02-19 05:08:49.000000000 +0100 +++ src/condor_starter.V6.1/os_proc.cpp 2019-02-27 10:09:43.513715435 +0100 @@ -1106,7 +1106,13 @@ } ArgList args; args.AppendArg("/usr/bin/nsenter"); - args.AppendArg("-a"); // all namespaces + #args.AppendArg("-a"); // all namespaces + args.AppendArg("-m"); + args.AppendArg("-u"); + args.AppendArg("-i"); + args.AppendArg("-n"); + args.AppendArg("-p"); + args.AppendArg("-U"); args.AppendArg("-t"); // target pid char buf[32]; sprintf(buf,"%d", pid); Greg, did I overlook something? I'll make this a Debian patch, and rebuild, if there's no veto...
It will break (at least) singularity with setuid root, since "-U" does not work there if /proc/<pid>/user is the same as the host's namespace.
I'll try to find out the best working combination in the next days. Potentially, disabling the automatic killing of the "sleep" job before nsenter attaches, wrappering nsenter correctly for CentOS 7 and setting some environment variables to have a well-defined PATH and working bash initialization when attaching could work around all discovered issues.Since adding the reaper seems to be the second change made to the same source file, perhaps we can learn about the rationale behind that?
That would indeed be interesting.
At least, now our users are out of the game and there's less stress from people flooding me with mails since their interactive jobs don't start, and we have a good way to test out such things before rolling them out to the full cluster ;-).Thanks (to you and them) for being the Guinea pigs...
;-) I'm now down to seeing: ------------------------------------- bash: cannot set terminal process group (-1): Inappropriate ioctl for device bash: no job control in this shell ------------------------------------- when attaching to a job, after injecting an nsenter-wrapper (not using -U) and patching /usr/libexec/condor/condor_ssh_to_job_shell_setup to not kill the "sleep" process, and doing those exports in the wrapper: export SHELL=/bin/bash export PATH=$PATH:/bin I probably need to revisit what exactly is happening to the TTYs now. Cheers, Oliver
- S
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature