Hi Dave, indeed, I can confirm: * the machine was running CentOS 7. * nsenter with added "-r" also fails for me. So I conclude, similar as you have concluded, that the problem appears to be a "permission denied" if the "nsentering"-process itself accesses things inside the mount namespace (after it has done setgid / setuid). So that would match failure of nsenter (accessing "root" with "-r") and condor_nsenter (accessing /dev/ptmx). * reinstalling the machine with RockyLinux 8, both "nsenter -r" and "condor_nsenter" work, and also "condor_ssh_to_job" works fine (now we just need to finish a filesystem upgrade so we can jump beyond CentOS 7...). So that means that fuse-overlayfs does not play well when wanting to attach to containers. That probably matches what you mentioned in: https://github.com/apptainer/apptainer/commit/f99c239235eb85dbbd43a324701353b38dca9997 which I found only now, correct? Cheers and thanks, Oliver Am 26.03.24 um 15:33 schrieb Dave Dykstra:
Nevermind, I was glossing over the deeper debugging you had done in condor, where you found the error being with /dev/ptmx. According to strace nsenter does nothing with /dev/ptmx. So I'd still like to know if condor_ssh_to_job works with apptainer-1.3.0 on EL9. Dave On Tue, Mar 26, 2024 at 09:26:50AM -0500, Dave Dykstra wrote:I have an nsenter script that includes using the "-r" option. That gets an error on EL7 with apptainer-1.3.0 using fuse-overlayfs: nsenter: cannot open /proc/1978437/root: Permission denied If I remove the "-r" option, it works, and it works even with the -r option on EL9 with apptainer-1.3.0 using kernel overlayfs. Do you see the same, Oliver? Perhaps condor_ssh_to_job is using the "-r" option or its equivalent, and the error isn't getting passed along. It's not clear to me that the option is doing anything useful; I'm not sure why I had it included in my script. Dave On Tue, Mar 26, 2024 at 09:05:07AM -0500, Dave Dykstra wrote:Hi Oliver, Does it happen also with an EL8 or EL9 host? That uses kernel overlayfs instead of fuse-overlayfs. I wonder if that makes a difference. Dave On Mon, Mar 25, 2024 at 10:31:20PM +0100, Oliver Freyermuth wrote:Dear HTCondor experts (probably Greg â hello from Bonn! ;-) ), I finally came around upgrading a first system to Apptainer 1.3.0, which now uses fuse-overlayfs by default instead of the previous "underlay" approach which is going to be deprecated in a future release. Trying to start an interactive job (or connecting to an existing job) now reveals (note: we run Apptainer unprivilegedly): ... Your condor job is running with pid(s) 34563. Can't open master pty Bad file descriptor read returned, exiting ... I can pin this down to the following problem: 1) Process tree: condor 34472 1.7 0.0 91048 8624 ? Ss 22:18 0:00 \_ condor_starter -f -local-name slot_type_1 -a slot1_2 exp196.physik.uni-bonn.de freyermu 34563 2.5 0.0 963708 19548 ? SNsl 22:18 0:00 \_ Apptainer runtime parent freyermu 34587 0.0 0.0 888016 17256 ? SNl 22:18 0:00 \_ appinit freyermu 34622 0.0 0.0 3800 1376 ? SN 22:18 0:00 | \_ /bin/sh -c sleep 180 && while test -d ${_CONDOR_SCRATCH_DIR}/.condor_ssh_to_job_1; do /bin/sleep 3; done freyermu 34623 0.0 0.0 2376 364 ? SN 22:18 0:00 | \_ sleep 180 freyermu 34606 1.5 0.0 16200 3092 ? SN 22:18 0:00 \_ /usr/libexec/apptainer/bin/fuse-overlayfs -f -o allow_other,lowerdir=/var/lib/apptainer/mnt/session/overlay-lowerdir:/var/lib/apptainer/mnt/session/rootfs... 2) Running the following (using any PID "deeper" down, e.g. 34622 or 34623, does the same) strace -f condor_nsenter -t 34587 -S <my_id> -G <_my_gid> reveals: open("/proc/34587/ns/uts", O_RDONLY) = 3 setns(3, 0) = 0 close(3) = 0 open("/proc/34587/ns/pid", O_RDONLY) = 3 setns(3, 0) = 0 close(3) = 0 open("/proc/34587/ns/mnt", O_RDONLY) = 3 setns(3, 0) = 0 close(3) = 0 setgroups(0, NULL) = 0 setgid(513) = 0 setuid(67803) = 0 ioctl(0, TIOCGWINSZ, {ws_row=58, ws_col=236, ws_xpixel=1891, ws_ypixel=988}) = 0 open("/dev/ptmx", O_RDWR) = -1 EACCES (Permission denied) ioctl(-1, TIOCSPTLCK, [0]) = -1 EBADF (Bad file descriptor) write(2, "Can't open master pty Bad file d"..., 42Can't open master pty Bad file descriptor ) = 42 exit_group(1) = ? +++ exited with 1 +++ I'm not sure what exactly makes the difference, but: nsenter -t 34587 -U -m -p -S <my_id> -G <_my_gid> "works" and I can access /dev/ptmx inside. SELinux is not at fault, no denials, and disabling it changes nothing. Any ideas? Do others also see this issue? Disabling fuse-overlayfs usage via the Apptainer configuration and forcing it back to use Underlay seems to fix the problem (enable overlay = no, enable underlay = yes), but the Apptainer guys want to remove that implementation at some point. Cheers, Oliver -- Oliver Freyermuth UniversitÃt Bonn Physikalisches Institut, Raum 1.047 NuÃallee 12 53115 Bonn -- Tel.: +49 228 73 2367 Fax: +49 228 73 7869 --_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
-- Oliver Freyermuth UniversitÃt Bonn Physikalisches Institut, Raum 1.047 NuÃallee 12 53115 Bonn -- Tel.: +49 228 73 2367 Fax: +49 228 73 7869 --
Attachment:
smime.p7s
Description: Kryptografische S/MIME-Signatur