[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] ERROR: SECMAN:2003:TCP connection tocollectorfailed.



to debug I completely disable ufw (along with SELinux) and the windows firewall (and defender)

> On Sep 26, 2018, at 12:08 PM, Grassia, Philippe M. (Philippe) <pgrassia@xxxxxxxxxxx> wrote:
> 
> This shold be sufficient. Then short of a firewall config (either on the windows host or the CONDOR_HOST) I'm at wit's end.
> 
> Philippe
>  
> 
> 
> On 9/26/18 7:44 AM, Stefano Colafranceschi wrote:
>> I can see running condor tasks under the windows client (condor_master, condor_procd, condor_schedd condor_share_port condor_startd), they start when the computer boots up as the condor msi package added condor as a service. Is this sufficient? Or do you suggest I am missing something that might cause the malfunctioning I am reporting?
>>  
>> StefanoC
>>  
>> From: Grassia, Philippe M. (Philippe)
>> Sent: Wednesday, September 26, 2018 10:27 AM
>> To: htcondor-users@xxxxxxxxxxx
>> Subject: Re: [HTCondor-users] ERROR: SECMAN:2003:TCP connection tocollectorfailed.
>>  
>> WSL does not have an init/service management system. How do you start and maintain the daemons on the windows host ? nssm ? powershell scripts ?
>> 
>>  
>> On 09/26/2018 07:02 AM, Stefano Colafranceschi wrote:
>> Find attached the config file of condor on my windows client (which is in 10.x.x.x), any further suggestions?
>>  
>> Thanks!
>>  
>> StefanoC
>>  
>> From: John M Knoeller
>> Sent: Tuesday, September 25, 2018 5:19 PM
>> To: HTCondor-Users Mail List
>> Subject: Re: [HTCondor-users] ERROR: SECMAN:2003:TCP connection to collectorfailed.
>>  
>> This looks ok to me.
>>  
>> Your ALLOW_WRITE line is allowing everything on the 10.* subnet, that should be sufficient to give your Windows machine permission to send ads to the Collector.  (Iâm assuming your Windows machine is in that subnet?)
>>  
>>  
>> Could I also see the configuration of your Windows machine?  Perhaps the problem is there.
>>  
>> -tj
>>  
>>  
>> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Stefano Colafranceschi - Mathematical Sciences Dept
>> Sent: Tuesday, September 25, 2018 12:14 PM
>> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
>> Subject: Re: [HTCondor-users] ERROR: SECMAN:2003:TCP connection to collector failed.
>>  
>> Thanks, find inline answer and attached config file
>> 
>> > On Sep 25, 2018, at 11:57 AM, John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
>> > 
>> > I presume x.x.x.x is the correct IP for your Linux central manager machine?
>> yes 10.6.10.15
>> 
>> >  
>> > The error in the Master log looks like it might be an authorization problem â the collector isnât allowing the Windows node to send updates.  
>> right but I canât figure out the issue.
>> 
>> >  
>> > Check the ALLOW_WRITE configuration knob in the in the Collector, does it permit the IP of the Windows node?
>> >  
>> > At the same timestamp  as the error from the master log (plus or minus a few seconds in case of clock mis-match), is there a message in the Collector log about refusing an attempt to send updates?
>> yes basically the error you describe as puzzling appears in coincidence with an attempt of the windows node to access.
>> 
>> >  
>> > This error
>> >  
>> > 09/24/18 09:46:01 Query info: matched=6; skipped=4; query_time=0.000806; send_time=0.001738; type=Any; requirements={( ( ( MyType == "Scheduler" ) || ( MyType == "Submitter" ) ) || ( ( MyType == "Machine" ) ) )}; peer=<127.0.0.1:25381>; projection={}
>> > 09/24/18 09:46:01 DaemonCore: Can't receive command request from 127.0.0.1 (perhaps a timeout?)
>> >  
>> > is a bit more puzzling to me.  I donât see how a request from a windows node to the collector could result in a peer address of 127.0.0.1
>> >  
>> > Does the config on the Windows machine have this?
>> this file c:\windows\system32\driver\etc\host does not contain 127.0.0.1 it contains just "10.6.10.15   mastercondorâ (I added this for convenience)
>> >  
>> > NETWORK_INTERFACE = 127.0.0.1
>> >  
>> > If so, remove that line.
>> >  
>> > If not try running
>> >  
>> >    condor_config_val -write:upgrade  config.log
>> ok done attached
>> >  
>> > and sending me the config.log file.  Iâll see if I can see anything in that config that could cause the peer address to be set incorrectly.
>> 
>> thank you very much for your help and support!
>> 
>> >  
>> > -tj
>> >  
>> > From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Stefano Colafranceschi
>> > Sent: Monday, September 24, 2018 12:12 PM
>> > To: htcondor-users@xxxxxxxxxxx
>> > Subject: [HTCondor-users] ERROR: SECMAN:2003:TCP connection to collector failed.
>> >  
>> > Dear all,
>> > I am trying to have a linux (latest) htcondor running with a windows node. On Linux I can submit jobs and they get processed no problems, but I canât figure out whatâs wrong adding a windows machine to the pool.
>> >  
>> > This is the error that I see on the MasterLog (windows client):
>> >  
>> > ERROR: SECMAN:2003:TCP connection to collector x.x.x.x failed.
>> > Failed to start non-blocking update to <x.x.x.x:9618>.
>> >  
>> > And this is the content of the Collectorlog on the linux server, just after I issued on the windows machine condor_status -master
>> >  
>> > 09/24/18 09:46:01 Got QUERY_STARTD_PVT_ADS
>> > 09/24/18 09:46:01 Number of Active Workers 0
>> > 09/24/18 09:46:01 (Sending 4 ads in response to query)
>> > 09/24/18 09:46:01 Query info: matched=4; skipped=0; query_time=0.000839; send_time=0.000619; type=MachinePrivate; requirements={true}; peer=<127.0.0.1:27363>; projection={}
>> > 09/24/18 09:46:01 Number of Active Workers 0
>> > 09/24/18 09:46:01 (Sending 6 ads in response to query)
>> > 09/24/18 09:46:01 Query info: matched=6; skipped=4; query_time=0.000806; send_time=0.001738; type=Any; requirements={( ( ( MyType == "Scheduler" ) || ( MyType == "Submitter" ) ) || ( ( MyType == "Machine" ) ) )}; peer=<127.0.0.1:25381>; projection={}
>> > 09/24/18 09:46:01 DaemonCore: Can't receive command request from 127.0.0.1 (perhaps a timeout?)
>> >  
>> >  
>> > p.s. I am sure both windows and Linux have 9618 port open.
>> >  
>> > Thanks for any suggestions!
>> > _______________________________________________
>> > HTCondor-users mailing list
>> > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> > subject: Unsubscribe
>> > You can also unsubscribe by visiting
>> > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> > 
>> > The archives can be found at:
>> > https://lists.cs.wisc.edu/archive/htcondor-users/
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>  
>> 
>> 
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>  
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>  
>>  
>> 
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to 
>> htcondor-users-request@xxxxxxxxxxx
>>  with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> 
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> 
>> The archives can be found at:
>> 
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/