Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job submission from a node outside a cluster instantiated on k8s

Date: Wed, 02 Dec 2020 00:29:56 +0100
From: Diego Ciangottini <diego.ciangottini@xxxxxxxxxx>
Subject: Re: [HTCondor-users] Job submission from a node outside a cluster instantiated on k8s

Hi again,

just to confirm that, in fact, exposing the schedd on the public IP aswell on an high nodeport (e.g. 31618) and using TCP_FORWARDING_HOST it'sdoing the job.

So at this point remains only the curiosity to understand if this is theminimum number of public end point I could have and the other threadabout scitoken authZ ;)


Anyway, thanks a lot!

Diego

Il 12/1/2020 7:13 PM, Diego Ciangottini ha scritto:

Hi Brian,
thank you. Should I take your answer as: also the schedd *MUST* have apublic endpoint? In this case, is it possible to change the scheddport (compliant with k8s node ports)?
In alternative I was thinking to access the schedd daemon running onpriv net from outside using the ccb, is this a thing? This would bethe ideal solution in this case I think.
Diego

Il 12/1/2020 1:52 PM, Bockelman, Brian ha scritto:
Hi Diego,
Have you tried setting TCP_FORWARDING_HOST to get the schedd toadvertise an external address to the collector?
I suspect itâs advertising its internal address which, of course, isnot a valid one coming from the outside.
Brian

Sent from my iPhone
On Dec 1, 2020, at 5:42 AM, Diego Ciangottini<diego.ciangottini@xxxxxxxxxx> wrote:
ïAdding some more context and details on my investigation so far.
In this k8s manifest (*) you can find my latest try, and itbasically does the following from the network point of view:
- CCB/Collector exposed to a nodeport on 30618, mapped to 30618inside the container
- Schedd appears to the collector as a headless k8s service atschedd.condor.svc.cluster.localÂÂÂÂ - CCB address on schedd file, pointing to the public IP ofcollector (I tried also the private one, no difference in theoutcome though)
With this configuration everything works perfectly as far as I aminside the cluster, but if I try from outside with this env (**) Iget this error (***).
Can you help me in understanding if what I am trying makes anysense? Do you see any obvious reason for this not to work? Anyfeedback at this point is very appreciated.
P.S. if I remove the CCB_ADDRESS from the condor configuration ofthe schedd I get this instead (****), don't know if it helps.
Thanks,
Diego

(*)

https://gist.github.com/dciangot/171ef8981ba554fed4ca8db97b4ddbf7

(**)

export _condor_AUTH_SSL_CLIENT_CAFILE=/ca.crt
export _condor_SEC_DEFAULT_AUTHENTICATION_METHODS=SCITOKENS
export _condor_SCITOKENS_FILE=/tmp/token
export _condor_COLLECTOR_HOST=90.147.174.149.xip.io:30618
export _condot_TOOL_DEBUG=D_FULLDEBUG,D_SECURITY

(***)

condor_q -address `condor_status -schedd -af ScheddIpAddr` -debug
12/01/20 11:23:22 ZKM: In unwrap.
12/01/20 11:23:22 SharedPortEndpoint: failed to find MyAddress in adfrom /var/lock/condor/shared_port_ad.12/01/20 11:23:22 CCBClient: Failed to get remote address for sharedport endpoint for reversed connection from schedd at<10.244.1.20:9618>.12/01/20 11:23:22 Failed to reverse connect to schedd at<10.244.1.20:9618> via CCB.
-- Failed to fetch ads from:<10.244.1.20:9618?CCBID=10.244.2.21:30618%3faddrs%3d10.244.2.21-30618%26alias%3d90.147.174.149.xip.io%26noUDP%26sock%3dcollector#2&PrivNet=schedd.condor.svc.cluster.local&addrs=10.244.1.20-9618&alias=schedd.condor.svc.cluster.local&noUDP&sock=schedd_21_fb94>: schedd.condor.svc.cluster.localCEDAR:6001:Failed to connect to<10.244.1.20:9618?CCBID=10.244.2.21:30618%3faddrs%3d10.244.2.21-30618%26alias%3d90.147.174.149.xip.io%26noUDP%26sock%3dcollector#2&PrivNet=schedd.condor.svc.cluster.local&addrs=10.244.1.20-9618&alias=schedd.condor.svc.cluster.local&noUDP&sock=schedd_21_fb94>
(****)

condor_q -address `condor_status -schedd -af ScheddIpAddr` -debug
12/01/20 11:10:06 ZKM: In unwrap.
12/01/20 11:10:26 attempt to connect to <10.244.1.17:9618> failed:timed out after 20 seconds.
-- Failed to fetch ads from:<10.244.1.17:9618?addrs=10.244.1.17-9618&alias=schedd.condor.svc.cluster.local&noUDP&sock=schedd_21_fb94>: schedd.condor.svc.cluster.localCEDAR:6001:Failed to connect to<10.244.1.17:9618?addrs=10.244.1.17-9618&alias=schedd.condor.svc.cluster.local&noUDP&sock=schedd_21_fb94>
Il 12/1/2020 12:52 AM, Diego Ciangottini ha scritto:
Hi again,
partially related to the activity of the previous email, I'm tryingto update our cluster setup on k8s and I was wondering if it waspossible to optimize what we are currently using.
In particular, we are keeping the schedd and collector pod on hostnetwork accessible from outside in order to allow submssion fromnodes outside the cluster. This comes at the cost of losing a lotof flexibility in the deployment of course.
So, is there any way to expose only the collector port as a serviceand making also the schedd running on private network onlyleveraging CCB or other solutions? Any suggestion/previous experience?
Thanks,
Diego
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxxwith a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxxwith a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

References:
- Re: [HTCondor-users] Job submission from a node outside a cluster instantiated on k8s
  - From: Diego Ciangottini
- Re: [HTCondor-users] Job submission from a node outside a cluster instantiated on k8s
  - From: Bockelman, Brian

Prev by Date: Re: [HTCondor-users] Job submission from a node outside a cluster instantiated on k8s
Next by Date: Re: [HTCondor-users] Job submission from a node outside a cluster instantiated on k8s
Previous by thread: Re: [HTCondor-users] Job submission from a node outside a cluster instantiated on k8s
Next by thread: [HTCondor-users] How to limit the number of running jobs on a startd?
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Job submission from a node outside a cluster instantiated on k8s