[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] question about condor_ssh_to_job behavior



Hi Benoit,

looks like your wrapper binds the jobdir to /var/lib/condor/execute/dir_67539:/srv- might be, that the ssh session does a nsenter into the job and gets the startds environment + HOME but ends up in the namespace inside the container, where only /srv exists? As for the exit does the wrapper script maybe interactions with the session exit when closing and catches the exit?

Cheers,
  Thomas

Else there was a recent bugfix [1] that affected condor_ssh_to_job but lsince ssh'ing itself works, it's probaby something else?

[1]
https://opensciencegrid.atlassian.net/browse/HTCONDOR-1245


On 04/11/2022 14.06, Benoit Roland wrote:
Dear all,

we are running the HTCondor EP daemons in an apptainer container and submitting jobs running themselves in a container.

I would like to ask you some questions about the behavior of condor_ssh_to_job.

1) At the beginning of the session, I got the following message.

Welcome to slot1@test-condor@c4p-login-dev!
Your condor job is running with pid(s) 67581.
Cannot chdir to */var/lib/condor/execute/dir_67539*: No such file or directory
Singularity>

Is the message "Cannot chdir to /var/lib/condor/execute/dir_67539: No such file or directory" expected?

The directory exists, the job is executed properly, and I can see in the StarterLog:

11/03/22 17:14:33 (pid:67539) (D_ALWAYS) Using wrapper /scratch/etc/condor/config.git/master/repo/jobwrapper.sh to exec /usr/bin/singularity exec -W /var/lib/condor/execute/dir_67539 --pwd /srv -B */var/lib/condor/execute/dir_67539*:/srv -B /cvmfs:/cvmfs -B /etc/hosts -B /etc/localtime --no-home -C --userns --env SINGULARITY_BIND= --env APPTAINER_BIND= --env APPTAINER_BINDPATH= /cvmfs/unpacked.cern.ch/gitlab-p4n.aip.de:5005/compute4punch/container-stacks/wlcg-wn:latest /srv//condor_exec.exe

11/03/22 17:14:33 (pid:67539) (D_ALWAYS) Create_Process succeeded, pid=67581

2) At the end of the session, I do not succeed to close it properly:

Singularity> exit
exit
logout
read returned, exiting

After that, the exit process is pending, and a CTRL C is needed to close the session.

3) Is there some way to optimise the environment?
  I was not able to make the completion work, neither the delete, or the navigation in the commands history.  I guess I am missing something in my configuration of condor_ssh_to_job.

 ÂThe sshd is started with:

Â/usr/sbin/sshd -i -e -E /tmp/condor_sshd.log -f /var/lib/condor/execute/dir_67539/.condor_ssh_to_job_1/sshd_config

and I can see in /tmp/condor_sshd.log:

Starting session: forced-command (key-option) '/usr/libexec/condor/condor_ssh_to_job_shell_setup /var/lib/condor/execute/dir_67539/.condor_ssh_to_job_1/env.sh' for benoit_roland from 2a00:139c:3:2e5::12 port 24035 id 0

Is there a way to further configure the environment which is setup by the above command line?

Thanks a lot in advance for your help and reply!

Cheers,
Benoit








_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature