[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] A couple of weird things with Condor Singularity Jobs



Hi All,

Iâm trying to set up Condor Singularity Jobs so I can get rid of my SL6 workers without waiting for the last of my SL6 users die of old age, and Iâve noticed a couple of weird things I thought Iâd mention.

First, USER_JOB_WRAPPER seems to break SINGULARITY_TARGET_DIR, if I run simple singularity job with:

SINGULARITY_TARGET_DIR = /workdir

and the simplest USER_JOB_WRAPPER of:

#!/bin/bash
exec $@

The arguments to the user wrapper script donât get remapped to /workdir so unless the temp directory is file system is bind mounted into the container the job files. From ps:

brew      1341  1330  0 17:14 ?        00:00:00         /usr/libexec/singularity/bin/action-suid /usr/local/bin/condorwrap.sh /scratch/condor/dir_1330/condor_exec.exe input-0.txt output/output-0.txt
brew      1370  1341  0 17:14 ?        00:00:00           shim-init                                /usr/local/bin/condorwrap.sh /scratch/condor/dir_1330/condor_exec.exe input-0.txt output/output-0.txt
brew      1375  1370  0 17:14 ?        00:00:00             /bin/bash /scratch/condor/dir_1330/condor_exec.exe input-0.txt output/output-0.txt
brew      1384  1375  0 17:14 ?        00:00:00               sleep 60

This also breaks file sandboxes into and out of the container unless the TMPDIR is bind mounted in the same location.

The second issue, is if I submit the job from the same host that is executes from then TARGET_DIR is ignored and it tries to do it all in my home area which isnât in the container. That may be about my security or domain settings or just a special edge case for jobs run on the submit host where condor decides not to set up the TMPDIR and do file transfers but just run in the submit dir, but it breaks on containers.

brew      3932  3924  0 17:24 ?        00:00:00         /usr/libexec/singularity/bin/action-suid /usr/local/bin/condorwrap.sh /net/home/ppd/brew/CondorTest/test.sh input-0.txt output/output-0.txt
brew      3961  3932  0 17:24 ?        00:00:00           shim-init                                /usr/local/bin/condorwrap.sh /net/home/ppd/brew/CondorTest/test.sh input-0.txt output/output-0.txt
brew      3966  3961  2 17:24 ?        00:00:00             /bin/bash /net/home/ppd/brew/CondorTest/test.sh input-0.txt output/output-0.txt
brew      3975  3966  0 17:24 ?        00:00:00               sleep 60

The first I can solve either with SINGULARITY_BIND_EXPR (which should probably be mentioned in the docs) or surrounding the USER_JOB_WRAPPER with âif ! defined SINGULARITY_JOBâ in the configs  but does anyone know how to change the second behaviour?

Ah, just having to put my thoughts into logical enough order to write this email, gave me to finding that adding:

+FileSystemDomain      = Undef

to my submit file is enough to fix the issue of the singularity jobs running on the submit hosts.

But these are probably both gotchas that either need documenting or fixing.

Yours,
Chris.

--
Dr Chris Brew
Scientific Computing Manager
Particle Physics Department
UKRI - STFC - Rutherford Appleton Laboratory
Harwell Oxford,
Didcot
OX11 0QX
+44 1235 446326