Dear HTCondor devs, since I don't have access to the new (very well set-up!) JIRA bug tracking, let me use the mailing list to report an issue we observe with good old 8.8 through to the 9.0.16 series: When starting a Singularity container job and attaching to it (or starting an interactive job), the process tree looks as follows: condor 4703 \_ condor_startd condor 3396708 \_ condor_starter -f -local-name slot_type_1 -a slot1_1 submitnode.physik.uni-bonn.de someuser 3396952 \_ Singularity runtime parent someuser 3396965 | \_ sinit someuser 3396988 | \_ /bin/sh -c sleep 180 && while test -d ${_CONDOR_SCRATCH_DIR}/.condor_ssh_to_job_1; do /bin/slee someuser 3396990 | \_ sleep 180 someuser 3396997 \_ sshd: someuser [priv] someuser 3396999 | \_ sshd: someuser@pts/0 someuser 3397000 | \_ /usr/bin/condor_docker_enter someuser 3397020 \_ /usr/bin/nsenter -t 3396988 -S 67803 -G 513 -m -i -p -r -w someuser 3397021 \_ /bin/sh -l -i However, the processes which "attached" later via nsenter do not end up in the same cgroup: # cat /sys/fs/cgroup/memory/htcondor/condor_pool_condor_slot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/cgroup.procs 3396952 3396965 3396988 3396990 Subsequently, limit enforcement (CPUs, Memory) does not take place, neither for interactive jobs nor for processes spawned after using "condor_ssh_to_job". Ideas for good workarounds (or of course a fix) welcome ;-). I'll sadly not make it to HTCondor Europe this year, since it collides with the start of our winter term (technical support for lectures and teaching duties), but I wish all of you a good time in Italy â hope to see you in person in one of the next years again! Cheers from Bonn, Oliver -- Oliver Freyermuth UniversitÃt Bonn Physikalisches Institut, Raum 1.047 NuÃallee 12 53115 Bonn -- Tel.: +49 228 73 2367 Fax: +49 228 73 7869 --
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature