Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] question related to condor_ssh_to_job and container
- Date: Tue, 25 Oct 2022 11:59:25 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] question related to condor_ssh_to_job and container
On 10/25/2022 9:39 AM, Benoit Roland
wrote:
Dear all,
I am having some issue to use condor_ssh_to_job, resulting in an
"ssh_exchange_identification: Connection closed by remote host".
I am using the condor version 9.0.14 for x86_64_CentOS7.
The situation is the following.
I am running HTCondor in an Apptainer container, and run a test
job in an Apptainer container via this HTCondor setup.
Hi Benoit,
Some questions and initial thoughts --
So you placed the HTCondor Execution Point (EP) daemons
(condor_startd etc) inside of an Apptainer container.... question:
how did you create this container? Did you you setup your container
image by using the get.htcondor.org tool, or by using the official
htcondor image, or by some other means? Does your container
include sshd ? Could you (easily) try using HTCondor v9.12.0+ and
see if you have better results?
Next, you said you ran a test job in an Apptainer container via this
setup.... does this mean you submitted a vanilla universe job
against the EP running inside a container, or were there two
containers involved: the container running the EP daemons, and then
a second container that was your job which was running nested inside
the EP container? If the latter, how was the job container launched
-- via HTCondor's Singularity support or via a shell script included
within the job itself?
Also note in the release notes at
https://htcondor.readthedocs.io/en/latest/version-history/stable-release-series-90.html
the following entry for HTCondor v9.0.17 : 'If âSingularityâ is
really the âApptainerâ runtime, HTCondor now
sets environment variables to be passed to the job appropriately,
which
prevents Apptainer from displaying ugly warnings about how this
wonât
work in the future.' So support for HTCondor launching container
jobs correctly via Apptainer was not added until v9.0.17; setting up
ssh_to_job requires the ability to set environment variables, which
did not work correctly with Apptainer in v9.0.14 since this was
not backwards compatible with Singularity.
You mentioned ".condor_ssh_to_job_1" dir does not exist -- if I
recall correctly, it only exists during the time a ssh_to_job is in
progress, so it may be difficult to "catch in the act" if it is
failing quickly. Perhaps in a test container you could wrap
/usr/sbin/sshd with a script that sleeps long enough for you to
inspect that directory and/or copies out the contents of that
directory. In any event, the ssh_config file is generated via the
template located in the file at
/usr/lib64/condor/condor_ssh_to_job_sshd_config_template ....
perhaps looking at that file will help you out.
Hope the above is helpful,
regards,
Todd