Dear HTCondor experts, we are observing unexpected behaviour in the following situation (inspired by http://research.cs.wisc.edu/htcondor/manual/v8.6/3_17Singularity_Support.html): 1. All jobs run in singularity containers (SINGULARITY_JOB = true) 2. Users can choose the desired OS using a custom job attribute "+DesiredOS". The relevant part of the used HTCondor configuration is: ----------------------------------------------------------------------- DEFAULT_CENTOS7_IMAGE = /cvmfs/example.com/singularity/CentOS7/default DEFAULT_SL6_IMAGE = /cvmfs/example.com/singularity/SL6/default DEFAULT_UBUNTU1604_IMAGE = /cvmfs/example.com/singularity/Ubuntu1604/default CHOSEN_IMAGE = ifThenElse(TARGET.DesiredOS is "Ubuntu1604", "$(DEFAULT_UBUNTU1604_IMAGE)", ifThenElse(TARGET.DesiredOS is "CentOS7", "$(DEFAULT_CENTOS7_IMAGE)", "$(DEFAULT_SL6_IMAGE)")) SINGULARITY_IMAGE_EXPR = $(CHOSEN_IMAGE) ----------------------------------------------------------------------- 3. Users can start interactive jobs and should obtain the desired runtime environment using condor_submit -i consel.jdl where the contents of consel.jdl is ----------------------------------------------------------------------- Universe = vanilla +DesiredOS = "Ubuntu1604" Queue ----------------------------------------------------------------------- Unfortunately this does not work. The users always end up in the default container OS (SL6 in the above example) as if "DesiredOS" was not defined. With non-interactive jobs the above configuration works as expected. Checking the process tree on the execute node, the situation looks like this: ----------------------------------------------------------------------- [...] condor 1676 0.0 0.0 98568 7680 ? Ss Feb25 0:07 /usr/sbin/condor_master -f root 2640 0.1 0.0 28376 8100 ? S Feb25 6:16 \_ condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 6 condor 2658 0.0 0.0 78628 6888 ? Ss Feb25 0:07 \_ condor_shared_port -f -p 9618 condor 2921 0.1 0.0 84240 10892 ? Ss Feb25 6:48 \_ condor_startd -f condor 45979 0.3 0.0 88388 7916 ? Ss 18:15 0:00 \_ condor_starter -f -a slot1_1 submit.example.com user1 46001 0.0 0.0 19944 796 ? SNs 18:15 0:00 \_ /usr/libexec/singularity/bin/action-suid /bin/sleep 180 user1 46008 0.0 0.0 4360 356 ? SN 18:15 0:00 | \_ /bin/sleep 180 user1 46022 0.0 0.0 19944 800 ? SNs 18:15 0:00 \_ /usr/libexec/singularity/bin/action-suid /usr/sbin/sshd -i -e -f /pool/condor user1 46029 0.0 0.0 70936 2636 ? SN 18:15 0:00 \_ sshd: user1 [priv] user1 46031 0.0 0.0 70936 1212 ? SN 18:15 0:00 \_ sshd: user1@pts/0 user1 46032 0.5 0.0 15124 3360 pts/0 SNs+ 18:15 0:00 \_ -/bin/bash [...] ----------------------------------------------------------------------- Obviously there are two different containers running: one running "sleep" and the other one executing sshd. Checking the file descriptors of the corresponding processes yields the following output: ----------------------------------------------------------------------- # ls -l /proc/46001/fd [...] lr-x------. 1 root root 64 1. MÃr 18:15 5 -> /cvmfs/example.com/singularity/Ubuntu1604/default [...] # ls -l /proc/46022/fd [...] lr-x------. 1 root root 64 1. MÃr 18:16 5 -> /cvmfs/example.com/singularity/SL6/default [...] ----------------------------------------------------------------------- From this information, it is obvious that there are two surprising phenomena: 1. There are *two* containers started. 2. The two containers use *different* images indicating that the container running sshd ignores the custom job attribute "DesiredOS". Is there a way to make interactive jobs with the possibility to choose singularity images work? Cheers, Peter P. S.: Is there a reason why the following command does not work (it would be very convenient): $ condor_submit -i '+DesiredOS = "Ubuntu1604"' condor_submit: invalid attribute name '+DesiredOS' for attrib=value assigment
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature