Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Submitting to a remote condor queue
- Date: Mon, 08 Feb 2016 14:11:24 +0000
- From: Brian Candler <b.candler@xxxxxxxxx>
- Subject: [HTCondor-users] Submitting to a remote condor queue
Hello,
I would like to run some code inside a docker container, which submits
jobs to a condor schedd running on the underlying host outside the
container.
+--------------------------+
| +-----------+ |
| | container | ---. |
| +-----------+ v |
| schedd |
| |
+--------------------------+
The final goal here is to run some fairly complex code which writes out
DAGs, and have that code bundled together with all its dependencies in a
docker container.
The docker container includes the docker binaries (e.g. condor_submit,
condor_submit_dag) but no running htcondor daemons. If necessary, it can
have a tweaked condor_config; or I can provide command-line options as
required to point to the schedd.
More generally: I'd like to understand how to configure a host A which
contains only the condor binaries (and no running daemons) to submit
jobs to a remote host B where the daemons are running. But I'll limit
myself to the docker-container-on-same-host case for the moment.
In my test setup, the outer host is ardb-dummy.int.example.net /
192.168.5.192. The container has address 172.17.0.9, but this is NAT'd
to 192.168.5.192 by the time it reaches the outer host (i.e. tcpdump on
the outer host shows traffic from 192.168.5.192 to 192.168.5.192 on the
"lo" interface)
Both are running htcondor 8.5.1 ubuntu packages
(https://research.cs.wisc.edu/htcondor/ubuntu/)
Here's how far I've got:
(1) I have condor_status working: it just needs the "-pool" flag.
root@fe1d7a934cdb:/# condor_status -pool ardb-dummy.int.example.net
Name OpSys Arch State Activity LoadAv Mem
ActvtyTime
slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.880 1497 0+00:58:48
Total Owner Claimed Unclaimed Matched Preempting
Backfill
X86_64/LINUX 1 0 0 1 0 0 0
Total 1 0 0 1 0 0 0
brian@fe1d7a934cdb:~$ condor_status -pool ardb-dummy.int.example.net -any
MyType TargetType Name
Collector None Personal Condor at
ardb-dummy.int.example
Scheduler None ardb-dummy.int.example.net
DaemonMaster None ardb-dummy.int.example.net
Negotiator None ardb-dummy.int.example.net
Machine Job slot1@xxxxxxxxxxxxxxxxxxxxxxxxxx
(2) I have condor_q working, but for some reason it needs both "-pool"
and "-name" flags.
root@fe1d7a934cdb:/# condor_q -pool ardb-dummy.int.example.net -name
ardb-dummy.int.example.net
-- Schedd: ardb-dummy.int.example.net : <192.168.5.192:50022>
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
(If I give only "-pool" then it appears to be still trying to talk to
the local condor daemons, and failing)
(3) My problem now is getting condor_submit to work.
On the outer host, I have set FLOCK_FROM = 192.168.5.192. However when I
try to submit from the container, I am getting authentication errors:
brian@fe1d7a934cdb:~$ condor_submit -pool ardb-dummy.int.example.net
-name ardb-dummy.int.example.net sleep.sub
Submitting job(s)
ERROR: Failed to connect to queue manager ardb-dummy.int.example.net
AUTHENTICATE:1003:Failed to authenticate with any method
AUTHENTICATE:1004:Failed to authenticate using GSI
GSI:5003:Failed to authenticate. Globus is reporting error
(851968:50). There is probably a problem with your credentials. (Did
you run grid-proxy-init?)
AUTHENTICATE:1004:Failed to authenticate using KERBEROS
AUTHENTICATE:1004:Failed to authenticate using FS
Looking at logs on the outer host, /var/log/condor/SchedLog says:
02/08/16 13:03:53 (pid:4531) DC_AUTHENTICATE: authentication of
<172.17.0.9:47024> did not result in a valid mapped user name, which is
required for this command (1112 QMGMT_WRITE_CMD), so aborting.
02/08/16 13:03:53 (pid:4531) DC_AUTHENTICATE: reason for authentication
failure: AUTHENTICATE:1003:Failed to authenticate with any
method|AUTHENTICATE:1004:Failed to authenticate using
GSI|GSI:5003:Failed to authenticate. Globus is reporting error
(851968:100). There is probably a problem with your credentials. (Did
you run grid-proxy-init?)|AUTHENTICATE:1004:Failed to authenticate using
KERBEROS|AUTHENTICATE:1004:Failed to authenticate using
FS|FS:1004:Unable to lstat(/tmp/FS_XXXeDGUPh)
(Note: my username "brian" exists in both container and outer host, with
the same uid and gid)
I have been trying to follow some documentation:
https://indico.cern.ch/event/272794/session/2/contribution/17/attachments/490442/677971/HTCondor-Security-Overview.pdf
but am getting a bit lost as to which knobs control authentication
between CLI tools and daemons, and which between daemons and daemons;
and which to set on the docker/client side, and which on the host/schedd
side.
So, by following this post:
https://www-auth.cs.wisc.edu/lists/htcondor-users/2013-February/msg00129.shtml
inside the container I have set in /etc/condor/condor_config.local:
SEC_PASSWORD_FILE = /etc/condor/pool_password
SEC_CLIENT_AUTHENTICATION = PREFERRED
SEC_CLIENT_AUTHENTICATION_METHODS = PASSWORD
and in the outer host's condor_config.local:
FLOCK_FROM = 192.168.5.192
SEC_DEFAULT_AUTHENTICATION = OPTIONAL
SEC_DEFAULT_AUTHENTICATION_METHODS = PASSWORD
SEC_WRITE_AUTHENTICATION = REQUIRED
SEC_WRITE_AUTHENTICATION_METHODS = PASSWORD
SEC_PASSWORD_FILE = /etc/condor/pool_password
and on both: echo "xyzzy" >/etc/condor/pool_password
This doesn't work. I get the following error on the client side:
brian@fe1d7a934cdb:~$ condor_submit -pool ardb-dummy.int.example.net
-name ardb-dummy.int.example.net sleep.sub
Submitting job(s)
ERROR: Failed to connect to queue manager ardb-dummy.int.example.net
AUTHENTICATE:1003:Failed to authenticate with any method
AUTHENTICATE:1004:Failed to authenticate using PASSWORD
And on the server side in SchedLog:
02/08/16 13:22:19 (pid:25919) DC_AUTHENTICATE: authentication of
<172.17.0.9:60837> did not result in a valid mapped user name, which is
required for this command (1112 QMGMT_WRITE_CMD), so aborting.
02/08/16 13:22:19 (pid:25919) DC_AUTHENTICATE: reason for authentication
failure: AUTHENTICATE:1003:Failed to authenticate with any
method|AUTHENTICATE:1004:Failed to authenticate using PASSWORD
Actually, htcondor documentation says that password authentication is
daemon-to-daemon only:
http://research.cs.wisc.edu/htcondor/manual/latest/3_6Security.html#sec:Password-Authentication
so this isn't going to work.
Next, looking at this document:
https://twiki.opensciencegrid.org/bin/view/CampusGrids/ConfiguringRemoteSubmissionHost
it says I also ought to set FLOCK_TO in the container, which I've now
done (FLOCK_TO = 192.168.5.192), but that doesn't seem to make any
difference.
Finally I tried changing the authentication to "PASSWORD,FS,CLAIMTOBE"
on both sides, which gives the following:
brian@fe1d7a934cdb:~$ condor_submit -pool ardb-dummy.int.example.net
-name ardb-dummy.int.example.net sleep.sub
Submitting job(s)
ERROR: Failed to connect to queue manager ardb-dummy.int.example.net
SECMAN:2010:Received "DENIED" from server for user brian using method
CLAIMTOBE.
AUTHENTICATE:1004:Failed to authenticate using FS
AUTHENTICATE:1004:Failed to authenticate using PASSWORD
Server side:
root@ardb-dummy:~# grep -v 'Number of Active Workers'
/var/log/condor/SchedLog | tail
...
02/08/16 13:33:02 (pid:27786) PERMISSION DENIED to brian from host
172.17.0.9 for command 1112 (QMGMT_WRITE_CMD), access level WRITE:
reason: WRITE authorization policy contains no matching ALLOW entry for
this request; identifiers used for this host: 172.17.0.9,172.17.0.9,
hostname size = 1, original ip address = 172.17.0.9
02/08/16 13:33:02 (pid:27786) DC_AUTHENTICATE: Command not authorized, done!
Ah, that's new. So I changed the outer host to add this in the
FLOCK_FROM range:
FLOCK_FROM = 192.168.5.192, 172.17.*
SEC_DEFAULT_AUTHENTICATION = OPTIONAL
SEC_DEFAULT_AUTHENTICATION_METHODS = PASSWORD,FS,CLAIMTOBE
SEC_WRITE_AUTHENTICATION = REQUIRED
SEC_WRITE_AUTHENTICATION_METHODS = PASSWORD,FS,CLAIMTOBE
SEC_PASSWORD_FILE = /etc/condor/pool_password
(although I've confirmed again with tcpdump the traffic source address
is 192.168.5.192). No difference.
In desperation, I set ALLOW_WRITE=* on the outer host. Then if I also
use "-remote" instead of "-name" on the command line, I get something
which works:
brian@fe1d7a934cdb:~$ condor_submit -pool ardb-dummy.int.example.net
-remote ardb-dummy.int.example.net sleep.sub
Submitting job(s).
1 job(s) submitted to cluster 370721.
But this is most likely to be horrendously insecure. I believe it's
relying on CLAIMTOBE (because if I remove it from the configs, it no
longer works)
Does someone have any suggestion for how I should be doing this?
Thanks,
Brian Candler.
P.S. Docker containers can include arbitrary usernames/UIDs, so it would
probably best if all jobs submitted by these containers were mapped to a
single userID in htcondor land. I'm sure I remember reading some way to get
P.P.S. I realise that the DAG itself and the submit files used by that
DAG also need to be visible on the host where dagman is running. I
expect I'll use a docker volume to expose some bit of host filesystem
for this purpose. But first I want to be clear on the right way to
submit simple jobs remotely.