[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Incompatibility of HTCondor "condor_ssh_to_job" with Apptainer 1.3.0?



Hi together,

trying "SINGULARITY_IS_SETUID = False", I get:

  condor_starter[53767]: Accepted new connection from ssh client for container job
  condor_starter[53767]: Create_Process(/usr/bin/condor_nsenter): child failed because PRIV_USER process was still root before exec()
  condor_starter[53767]: singularity enter_ns returned pid 0

in the logs of the starter, and the condor_ssh_to_job session hangs. It seems "PRIV_USER" just means "process should run as user", but since condor_nsenter is forked off from the starter and switches IDs afterwards,
it appears to inherit "root" if the starter runs as root (if I understand that correctly).

The corresponding change was written when nsenter was still used:
 https://github.com/htcondor/htcondor/commit/105c359c1e937cc9248b7999dc68ebe18940d4a9
so maybe that changed at some point afterwards?

So I think we probably have an HTCondor issue ("SINGULARITY_IS_SETUID = False" does not work, unless I am using it in the wrong way), so the knob can't help as a potential workaround for the problem with fuse-overlayfs on CentOS 7.
While the days of CentOS 7 are approaching their end indeed, maybe the independent(?) HTCondor issue is still work investigating (calling condor_nsenter as user instead of as root should be more safe, even though it of course switches UID/GID afterwards).

Cheers,
	Oliver


Am 26.03.24 um 18:39 schrieb Oliver Freyermuth:
Dear Dave,

the workaround for overlayfs does indeed defy explanation ;-).

I found that one key ingredient to reproduce this problem is that the container / apptainer is called with user privileges by HTCondor, i.e. after switching UID / GID,
while condor_nsenter is called as root user (if I am not mistaken), and then does setuid / setgid inside.

Indeed, as "root" on a CentOS 7 system, I can perform:

 Â sudo -u my_user_name /cvmfs/oasis.opensciencegrid.org/mis/apptainer/1.3.0/bin/apptainer exec -C /cvmfs/atlas.cern.ch/repo/containers/fs/singularity/x86_64-centos7 bash

and then attach just fine with:

 Â sudo -u my_user_name condor_nsenter -t 78626 -S $(id -u my_user_name) -G $(id -g my_user_name)

But I can not attach to the same container running as user with unprivileged Apptainer when executing "condor_nsenter" as root:

 Â condor_nsenter -t 78626 -S $(id -u my_user_name) -G $(id -g my_user_name)
 Â Can't open master pty Bad file descriptor

Now I found there is the HTCondor knob SINGULARITY_IS_SETUID which allows to switch the code path HTCondor takes, and tell it to call "condor_nsenter" after dropping privileges:
 Â https://github.com/htcondor/htcondor/blob/076559e4c63a46428da7e5db6299f96ed2d8e3c8/src/condor_starter.V6.1/os_proc.cpp#L1239
I thought this was autodetected, but it seems it is not (we have apptainer explicitly configured with "allow setuid = no", but the HTCondor knob SINGULARITY_IS_SETUID seems to be left for user configuration).

I'll reinstall the node with CentOS 7 and check whether toggling that knob fixes things.

Cheers and thanks,
 ÂÂÂÂOliver

Am 26.03.24 um 17:57 schrieb Dave Dykstra:
It's interesting that you found that. I wouldn't have thought of that as a connection, since that problem happened only with kernel overlayfs. It does however have a similar symptom with fuse-overlayfs.

Unfortunately unlike in the previous case I haven't been able to reproduce it with unshare commands where it's easier to experiment with workarounds. The workaround for overlayfs was to mount it, stat a file, unmount it, and remount it. It defies explanation, but it consistently works. Doing likewise with fuse-overlayfs would be considerably more trouble, and I don't know if it will work since I haven't tried it.

condor_nsenter does need to allocate a pseudo-tty. Given the short remaining lifetime of EL7, however, this problem may just be best off left to expire.

Curiously, if I just run apptainer-1.3.0 interactively on an EL7 machine, I cannot reproduce the error from condor_nsenter that you got. It must be something aout the way that condor_starter starts apptainer.

Dave

On Tue, Mar 26, 2024 at 04:44:22PM +0100, Oliver Freyermuth wrote:
Hi Dave,

indeed, I can confirm:

* the machine was running CentOS 7.

* nsenter with added "-r" also fails for me. So I conclude, similar as you have concluded, that the problem appears to be a "permission denied" if the "nsentering"-process itself accesses things inside the mount namespace (after it has done setgid / setuid).
ÂÂÂ So that would match failure of nsenter (accessing "root" with "-r") and condor_nsenter (accessing /dev/ptmx).

* reinstalling the machine with RockyLinux 8, both "nsenter -r" and "condor_nsenter" work, and also "condor_ssh_to_job" works fine
ÂÂÂ (now we just need to finish a filesystem upgrade so we can jump beyond CentOS 7...).

So that means that fuse-overlayfs does not play well when wanting to attach to containers. That probably matches what you mentioned in:
ÂÂ https://github.com/apptainer/apptainer/commit/f99c239235eb85dbbd43a324701353b38dca9997
which I found only now, correct?

Cheers and thanks,
ÂÂÂÂOliver

Am 26.03.24 um 15:33 schrieb Dave Dykstra:
Nevermind, I was glossing over the deeper debugging you had done in condor, where you found the error being with /dev/ptmx. According to strace nsenter does nothing with /dev/ptmx. So I'd still like to know if condor_ssh_to_job works with apptainer-1.3.0 on EL9.

Dave

On Tue, Mar 26, 2024 at 09:26:50AM -0500, Dave Dykstra wrote:
I have an nsenter script that includes using the "-r" option. That gets an error on EL7 with apptainer-1.3.0 using fuse-overlayfs:
ÂÂÂÂÂ nsenter: cannot open /proc/1978437/root: Permission denied

If I remove the "-r" option, it works, and it works even with the -r option on EL9 with apptainer-1.3.0 using kernel overlayfs.

Do you see the same, Oliver? Perhaps condor_ssh_to_job is using the "-r" option or its equivalent, and the error isn't getting passed along. It's not clear to me that the option is doing anything useful; I'm not sure why I had it included in my script.

Dave

On Tue, Mar 26, 2024 at 09:05:07AM -0500, Dave Dykstra wrote:
Hi Oliver,

Does it happen also with an EL8 or EL9 host? That uses kernel overlayfs instead of fuse-overlayfs. I wonder if that makes a difference.

Dave

On Mon, Mar 25, 2024 at 10:31:20PM +0100, Oliver Freyermuth wrote:
Dear HTCondor experts (probably Greg â hello from Bonn! ;-) ),

I finally came around upgrading a first system to Apptainer 1.3.0, which now uses fuse-overlayfs by default instead of the previous "underlay" approach which is going to be deprecated in a future release.

Trying to start an interactive job (or connecting to an existing job) now reveals (note: we run Apptainer unprivilegedly):
...
ÂÂÂ Your condor job is running with pid(s) 34563.
ÂÂÂ Can't open master pty Bad file descriptor
ÂÂÂ read returned, exiting
...


I can pin this down to the following problem:


1) Process tree:

condorÂÂÂ 34472Â 1.7Â 0.0Â 91048Â 8624 ?ÂÂÂÂÂÂÂ SsÂÂ 22:18ÂÂ 0:00ÂÂÂÂÂ \_ condor_starter -f -local-name slot_type_1 -a slot1_2 exp196.physik.uni-bonn.de
freyermu 34563 2.5 0.0 963708 19548 ? SNsl 22:18 0:00 \_ Apptainer runtime parent
freyermu 34587 0.0 0.0 888016 17256 ? SNl 22:18 0:00 \_ appinit
freyermu 34622 0.0 0.0 3800 1376 ? SN 22:18 0:00 | \_ /bin/sh -c sleep 180 && while test -d ${_CONDOR_SCRATCH_DIR}/.condor_ssh_to_job_1; do /bin/sleep 3; done
freyermu 34623 0.0 0.0 2376 364 ? SN 22:18 0:00 | \_ sleep 180
freyermu 34606 1.5 0.0 16200 3092 ? SN 22:18 0:00 \_ /usr/libexec/apptainer/bin/fuse-overlayfs -f -o allow_other,lowerdir=/var/lib/apptainer/mnt/session/overlay-lowerdir:/var/lib/apptainer/mnt/session/rootfs...


2) Running the following (using any PID "deeper" down, e.g. 34622 or 34623, does the same)
ÂÂÂÂÂÂ strace -f condor_nsenter -t 34587 -S <my_id> -G <_my_gid>
ÂÂÂÂÂ reveals:
ÂÂÂÂÂÂ open("/proc/34587/ns/uts", O_RDONLY)ÂÂÂ = 3
ÂÂÂÂÂÂ setns(3, 0)ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = 0
ÂÂÂÂÂÂ close(3)ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = 0
ÂÂÂÂÂÂ open("/proc/34587/ns/pid", O_RDONLY)ÂÂÂ = 3
ÂÂÂÂÂÂ setns(3, 0)ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = 0
ÂÂÂÂÂÂ close(3)ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = 0
ÂÂÂÂÂÂ open("/proc/34587/ns/mnt", O_RDONLY)ÂÂÂ = 3
ÂÂÂÂÂÂ setns(3, 0)ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = 0
ÂÂÂÂÂÂ close(3)ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = 0
ÂÂÂÂÂÂ setgroups(0, NULL)ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = 0
ÂÂÂÂÂÂ setgid(513)ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = 0
ÂÂÂÂÂÂ setuid(67803)ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = 0
ÂÂÂÂÂÂ ioctl(0, TIOCGWINSZ, {ws_row=58, ws_col=236, ws_xpixel=1891, ws_ypixel=988}) = 0
ÂÂÂÂÂÂ open("/dev/ptmx", O_RDWR)ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = -1 EACCES (Permission denied)
ÂÂÂÂÂÂ ioctl(-1, TIOCSPTLCK, [0])ÂÂÂÂÂÂÂÂÂÂÂÂÂ = -1 EBADF (Bad file descriptor)
ÂÂÂÂÂÂ write(2, "Can't open master pty Bad file d"..., 42Can't open master pty Bad file descriptor
ÂÂÂÂÂÂ ) = 42
ÂÂÂÂÂÂ exit_group(1)ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = ?
ÂÂÂÂÂÂ +++ exited with 1 +++

I'm not sure what exactly makes the difference, but:
ÂÂÂ nsenter -t 34587 -U -m -p -S <my_id> -G <_my_gid>
"works" and I can access /dev/ptmx inside.

SELinux is not at fault, no denials, and disabling it changes nothing.

Any ideas? Do others also see this issue?

Disabling fuse-overlayfs usage via the Apptainer configuration and forcing it back to use Underlay seems to fix the problem (enable overlay = no, enable underlay = yes),
but the Apptainer guys want to remove that implementation at some point.

Cheers,
ÂÂÂÂÂÂÂÂÂ Oliver

--
Oliver Freyermuth
UniversitÃt Bonn
Physikalisches Institut, Raum 1.047
NuÃallee 12
53115 Bonn
--
Tel.: +49 228 73 2367
Fax:Â +49 228 73 7869
--



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

--
Oliver Freyermuth
UniversitÃt Bonn
Physikalisches Institut, Raum 1.047
NuÃallee 12
53115 Bonn
--
Tel.: +49 228 73 2367
Fax:Â +49 228 73 7869
--





_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

--
Oliver Freyermuth
UniversitÃt Bonn
Physikalisches Institut, Raum 1.047
NuÃallee 12
53115 Bonn
--
Tel.: +49 228 73 2367
Fax:  +49 228 73 7869
--

Attachment: smime.p7s
Description: Kryptografische S/MIME-Signatur