On Dec 1, 2020, at 5:42 AM, Diego Ciangottini
<diego.ciangottini@xxxxxxxxxx> wrote:
ïAdding some more context and details on my investigation so far.
In this k8s manifest (*) you can find my latest try, and it
basically does the following from the network point of view:
- CCB/Collector exposed to a nodeport on 30618, mapped to 30618
inside the container
- Schedd appears to the collector as a headless k8s service at
schedd.condor.svc.cluster.local
ÂÂÂÂ - CCB address on schedd file, pointing to the public IP of
collector (I tried also the private one, no difference in the
outcome though)
With this configuration everything works perfectly as far as I am
inside the cluster, but if I try from outside with this env (**) I
get this error (***).
Can you help me in understanding if what I am trying makes any
sense? Do you see any obvious reason for this not to work? Any
feedback at this point is very appreciated.
P.S. if I remove the CCB_ADDRESS from the condor configuration of
the schedd I get this instead (****), don't know if it helps.
Thanks,
Diego
(*)
https://gist.github.com/dciangot/171ef8981ba554fed4ca8db97b4ddbf7
(**)
export _condor_AUTH_SSL_CLIENT_CAFILE=/ca.crt
export _condor_SEC_DEFAULT_AUTHENTICATION_METHODS=SCITOKENS
export _condor_SCITOKENS_FILE=/tmp/token
export _condor_COLLECTOR_HOST=90.147.174.149.xip.io:30618
export _condot_TOOL_DEBUG=D_FULLDEBUG,D_SECURITY
(***)
condor_q -address `condor_status -schedd -af ScheddIpAddr` -debug
12/01/20 11:23:22 ZKM: In unwrap.
12/01/20 11:23:22 SharedPortEndpoint: failed to find MyAddress in ad
from /var/lock/condor/shared_port_ad.
12/01/20 11:23:22 CCBClient: Failed to get remote address for shared
port endpoint for reversed connection from schedd at
<10.244.1.20:9618>.
12/01/20 11:23:22 Failed to reverse connect to schedd at
<10.244.1.20:9618> via CCB.
-- Failed to fetch ads from:
<10.244.1.20:9618?CCBID=10.244.2.21:30618%3faddrs%3d10.244.2.21-30618%26alias%3d90.147.174.149.xip.io%26noUDP%26sock%3dcollector#2&PrivNet=schedd.condor.svc.cluster.local&addrs=10.244.1.20-9618&alias=schedd.condor.svc.cluster.local&noUDP&sock=schedd_21_fb94>
: schedd.condor.svc.cluster.local
CEDAR:6001:Failed to connect to
<10.244.1.20:9618?CCBID=10.244.2.21:30618%3faddrs%3d10.244.2.21-30618%26alias%3d90.147.174.149.xip.io%26noUDP%26sock%3dcollector#2&PrivNet=schedd.condor.svc.cluster.local&addrs=10.244.1.20-9618&alias=schedd.condor.svc.cluster.local&noUDP&sock=schedd_21_fb94>
(****)
condor_q -address `condor_status -schedd -af ScheddIpAddr` -debug
12/01/20 11:10:06 ZKM: In unwrap.
12/01/20 11:10:26 attempt to connect to <10.244.1.17:9618> failed:
timed out after 20 seconds.
-- Failed to fetch ads from:
<10.244.1.17:9618?addrs=10.244.1.17-9618&alias=schedd.condor.svc.cluster.local&noUDP&sock=schedd_21_fb94>
: schedd.condor.svc.cluster.local
CEDAR:6001:Failed to connect to
<10.244.1.17:9618?addrs=10.244.1.17-9618&alias=schedd.condor.svc.cluster.local&noUDP&sock=schedd_21_fb94>
Il 12/1/2020 12:52 AM, Diego Ciangottini ha scritto:
Hi again,
partially related to the activity of the previous email, I'm trying
to update our cluster setup on k8s and I was wondering if it was
possible to optimize what we are currently using.
In particular, we are keeping the schedd and collector pod on host
network accessible from outside in order to allow submssion from
nodes outside the cluster. This comes at the cost of losing a lot
of flexibility in the deployment of course.
So, is there any way to expose only the collector port as a service
and making also the schedd running on private network only
leveraging CCB or other solutions? Any suggestion/previous experience?
Thanks,
Diego
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/