Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Incompatibility of HTCondor "condor_ssh_to_job" with Apptainer 1.3.0?
- Date: Tue, 26 Mar 2024 14:05:07 +0000
- From: Dave Dykstra <dwd@xxxxxxxx>
- Subject: Re: [HTCondor-users] Incompatibility of HTCondor "condor_ssh_to_job" with Apptainer 1.3.0?
Hi Oliver,
Does it happen also with an EL8 or EL9 host? That uses kernel overlayfs instead of fuse-overlayfs. I wonder if that makes a difference.
Dave
On Mon, Mar 25, 2024 at 10:31:20PM +0100, Oliver Freyermuth wrote:
> Dear HTCondor experts (probably Greg â hello from Bonn! ;-) ),
>
> I finally came around upgrading a first system to Apptainer 1.3.0, which now uses fuse-overlayfs by default instead of the previous "underlay" approach which is going to be deprecated in a future release.
>
> Trying to start an interactive job (or connecting to an existing job) now reveals (note: we run Apptainer unprivilegedly):
> ...
> Your condor job is running with pid(s) 34563.
> Can't open master pty Bad file descriptor
> read returned, exiting
> ...
>
>
> I can pin this down to the following problem:
>
>
> 1) Process tree:
>
> condor 34472 1.7 0.0 91048 8624 ? Ss 22:18 0:00 \_ condor_starter -f -local-name slot_type_1 -a slot1_2 exp196.physik.uni-bonn.de
> freyermu 34563 2.5 0.0 963708 19548 ? SNsl 22:18 0:00 \_ Apptainer runtime parent
> freyermu 34587 0.0 0.0 888016 17256 ? SNl 22:18 0:00 \_ appinit
> freyermu 34622 0.0 0.0 3800 1376 ? SN 22:18 0:00 | \_ /bin/sh -c sleep 180 && while test -d ${_CONDOR_SCRATCH_DIR}/.condor_ssh_to_job_1; do /bin/sleep 3; done
> freyermu 34623 0.0 0.0 2376 364 ? SN 22:18 0:00 | \_ sleep 180
> freyermu 34606 1.5 0.0 16200 3092 ? SN 22:18 0:00 \_ /usr/libexec/apptainer/bin/fuse-overlayfs -f -o allow_other,lowerdir=/var/lib/apptainer/mnt/session/overlay-lowerdir:/var/lib/apptainer/mnt/session/rootfs...
>
>
> 2) Running the following (using any PID "deeper" down, e.g. 34622 or 34623, does the same)
> strace -f condor_nsenter -t 34587 -S <my_id> -G <_my_gid>
> reveals:
> open("/proc/34587/ns/uts", O_RDONLY) = 3
> setns(3, 0) = 0
> close(3) = 0
> open("/proc/34587/ns/pid", O_RDONLY) = 3
> setns(3, 0) = 0
> close(3) = 0
> open("/proc/34587/ns/mnt", O_RDONLY) = 3
> setns(3, 0) = 0
> close(3) = 0
> setgroups(0, NULL) = 0
> setgid(513) = 0
> setuid(67803) = 0
> ioctl(0, TIOCGWINSZ, {ws_row=58, ws_col=236, ws_xpixel=1891, ws_ypixel=988}) = 0
> open("/dev/ptmx", O_RDWR) = -1 EACCES (Permission denied)
> ioctl(-1, TIOCSPTLCK, [0]) = -1 EBADF (Bad file descriptor)
> write(2, "Can't open master pty Bad file d"..., 42Can't open master pty Bad file descriptor
> ) = 42
> exit_group(1) = ?
> +++ exited with 1 +++
>
> I'm not sure what exactly makes the difference, but:
> nsenter -t 34587 -U -m -p -S <my_id> -G <_my_gid>
> "works" and I can access /dev/ptmx inside.
>
> SELinux is not at fault, no denials, and disabling it changes nothing.
>
> Any ideas? Do others also see this issue?
>
> Disabling fuse-overlayfs usage via the Apptainer configuration and forcing it back to use Underlay seems to fix the problem (enable overlay = no, enable underlay = yes),
> but the Apptainer guys want to remove that implementation at some point.
>
> Cheers,
> Oliver
>
> --
> Oliver Freyermuth
> UniversitÃt Bonn
> Physikalisches Institut, Raum 1.047
> NuÃallee 12
> 53115 Bonn
> --
> Tel.: +49 228 73 2367
> Fax: +49 228 73 7869
> --
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/