[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Bug: interactive jobs + custom job attributes + singularity



Dear all,

the P. S. part of my email is solved thanks to Christoph.

Does anybody have an idea how to address the remaining issues?

Cheers, Peter

On 01.03.2018 18:35, Peter Wienemann wrote:
> Dear HTCondor experts,
> 
> we are observing unexpected behaviour in the following situation
> (inspired by
> http://research.cs.wisc.edu/htcondor/manual/v8.6/3_17Singularity_Support.html):
> 
> 1. All jobs run in singularity containers (SINGULARITY_JOB = true)
> 
> 2. Users can choose the desired OS using a custom job attribute
> "+DesiredOS". The relevant part of the used HTCondor configuration is:
> 
> -----------------------------------------------------------------------
> DEFAULT_CENTOS7_IMAGE = /cvmfs/example.com/singularity/CentOS7/default
> 
> DEFAULT_SL6_IMAGE = /cvmfs/example.com/singularity/SL6/default
> 
> DEFAULT_UBUNTU1604_IMAGE = /cvmfs/example.com/singularity/Ubuntu1604/default
> 
> CHOSEN_IMAGE = ifThenElse(TARGET.DesiredOS is "Ubuntu1604",
> "$(DEFAULT_UBUNTU1604_IMAGE)", ifThenElse(TARGET.DesiredOS is "CentOS7",
> "$(DEFAULT_CENTOS7_IMAGE)", "$(DEFAULT_SL6_IMAGE)"))
> 
> SINGULARITY_IMAGE_EXPR = $(CHOSEN_IMAGE)
> -----------------------------------------------------------------------
> 
> 3. Users can start interactive jobs and should obtain the desired
> runtime environment using
> 
>     condor_submit -i consel.jdl
> 
> where the contents of consel.jdl is
> 
> -----------------------------------------------------------------------
> Universe   = vanilla
> +DesiredOS = "Ubuntu1604"
> Queue
> -----------------------------------------------------------------------
> 
> Unfortunately this does not work. The users always end up in the default
> container OS (SL6 in the above example) as if "DesiredOS" was not defined.
> 
> With non-interactive jobs the above configuration works as expected.
> 
> Checking the process tree on the execute node, the situation looks like
> this:
> 
> -----------------------------------------------------------------------
> [...]
> condor    1676  0.0  0.0  98568  7680 ?        Ss   Feb25   0:07
> /usr/sbin/condor_master -f
> root      2640  0.1  0.0  28376  8100 ?        S    Feb25   6:16  \_
> condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R
> 1000000 -S 6
> condor    2658  0.0  0.0  78628  6888 ?        Ss   Feb25   0:07  \_
> condor_shared_port -f -p 9618
> condor    2921  0.1  0.0  84240 10892 ?        Ss   Feb25   6:48  \_
> condor_startd -f
> condor   45979  0.3  0.0  88388  7916 ?        Ss   18:15   0:00      \_
> condor_starter -f -a slot1_1 submit.example.com
> user1    46001  0.0  0.0  19944   796 ?        SNs  18:15   0:00
>  \_ /usr/libexec/singularity/bin/action-suid /bin/sleep 180
> user1    46008  0.0  0.0   4360   356 ?        SN   18:15   0:00
>  |   \_ /bin/sleep 180
> user1    46022  0.0  0.0  19944   800 ?        SNs  18:15   0:00
>  \_ /usr/libexec/singularity/bin/action-suid /usr/sbin/sshd -i -e -f
> /pool/condor
> user1    46029  0.0  0.0  70936  2636 ?        SN   18:15   0:00
>      \_ sshd: user1 [priv]
> user1    46031  0.0  0.0  70936  1212 ?        SN   18:15   0:00
>          \_ sshd: user1@pts/0
> user1    46032  0.5  0.0  15124  3360 pts/0    SNs+ 18:15   0:00
>              \_ -/bin/bash
> [...]
> -----------------------------------------------------------------------
> 
> Obviously there are two different containers running: one running
> "sleep" and the other one executing sshd. Checking the file descriptors
> of the corresponding processes yields the following output:
> 
> -----------------------------------------------------------------------
> # ls -l /proc/46001/fd
> [...]
> lr-x------. 1 root  root         64  1. MÃr 18:15 5 ->
> /cvmfs/example.com/singularity/Ubuntu1604/default
> [...]
> # ls -l /proc/46022/fd
> [...]
> lr-x------. 1 root  root         64  1. MÃr 18:16 5 ->
> /cvmfs/example.com/singularity/SL6/default
> [...]
> -----------------------------------------------------------------------
> 
> From this information, it is obvious that there are two surprising
> phenomena:
> 
> 1. There are *two* containers started.
> 2. The two containers use *different* images indicating that the
> container running sshd ignores the custom job attribute "DesiredOS".
> 
> Is there a way to make interactive jobs with the possibility to choose
> singularity images work?
> 
> Cheers, Peter
> 
> P. S.: Is there a reason why the following command does not work (it
> would be very convenient):
> 
> $ condor_submit -i '+DesiredOS = "Ubuntu1604"'
> condor_submit: invalid attribute name '+DesiredOS' for attrib=value
> assigment
> 
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/