Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] ERROR: SECMAN:2003:TCP connection tocollectorfailed.
- Date: Wed, 26 Sep 2018 13:56:06 -0400
- From: Stefano Colafranceschi - Mathematical Sciences Dept <stefano.colafranceschi@xxxxxxx>
- Subject: Re: [HTCondor-users] ERROR: SECMAN:2003:TCP connection tocollectorfailed.
to be complete I see that loopback IP 127.0.0.1 even when I simply type condor_status on the linux master. What do you suggest to debug further?
StefanoC
> On Sep 26, 2018, at 12:21 PM, Stefano Colafranceschi - Mathematical Sciences Dept <stefano.colafranceschi@xxxxxxx> wrote:
>
> to debug I completely disable ufw (along with SELinux) and the windows firewall (and defender)
>
>> On Sep 26, 2018, at 12:08 PM, Grassia, Philippe M. (Philippe) <pgrassia@xxxxxxxxxxx> wrote:
>>
>> This shold be sufficient. Then short of a firewall config (either on the windows host or the CONDOR_HOST) I'm at wit's end.
>>
>> Philippe
>>
>>
>>
>> On 9/26/18 7:44 AM, Stefano Colafranceschi wrote:
>>> I can see running condor tasks under the windows client (condor_master, condor_procd, condor_schedd condor_share_port condor_startd), they start when the computer boots up as the condor msi package added condor as a service. Is this sufficient? Or do you suggest I am missing something that might cause the malfunctioning I am reporting?
>>>
>>> StefanoC
>>>
>>> From: Grassia, Philippe M. (Philippe)
>>> Sent: Wednesday, September 26, 2018 10:27 AM
>>> To: htcondor-users@xxxxxxxxxxx
>>> Subject: Re: [HTCondor-users] ERROR: SECMAN:2003:TCP connection tocollectorfailed.
>>>
>>> WSL does not have an init/service management system. How do you start and maintain the daemons on the windows host ? nssm ? powershell scripts ?
>>>
>>>
>>> On 09/26/2018 07:02 AM, Stefano Colafranceschi wrote:
>>> Find attached the config file of condor on my windows client (which is in 10.x.x.x), any further suggestions?
>>>
>>> Thanks!
>>>
>>> StefanoC
>>>
>>> From: John M Knoeller
>>> Sent: Tuesday, September 25, 2018 5:19 PM
>>> To: HTCondor-Users Mail List
>>> Subject: Re: [HTCondor-users] ERROR: SECMAN:2003:TCP connection to collectorfailed.
>>>
>>> This looks ok to me.
>>>
>>> Your ALLOW_WRITE line is allowing everything on the 10.* subnet, that should be sufficient to give your Windows machine permission to send ads to the Collector. (Iâm assuming your Windows machine is in that subnet?)
>>>
>>>
>>> Could I also see the configuration of your Windows machine? Perhaps the problem is there.
>>>
>>> -tj
>>>
>>>
>>> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Stefano Colafranceschi - Mathematical Sciences Dept
>>> Sent: Tuesday, September 25, 2018 12:14 PM
>>> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
>>> Subject: Re: [HTCondor-users] ERROR: SECMAN:2003:TCP connection to collector failed.
>>>
>>> Thanks, find inline answer and attached config file
>>>
>>>> On Sep 25, 2018, at 11:57 AM, John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
>>>>
>>>> I presume x.x.x.x is the correct IP for your Linux central manager machine?
>>> yes 10.6.10.15
>>>
>>>>
>>>> The error in the Master log looks like it might be an authorization problem â the collector isnât allowing the Windows node to send updates.
>>> right but I canât figure out the issue.
>>>
>>>>
>>>> Check the ALLOW_WRITE configuration knob in the in the Collector, does it permit the IP of the Windows node?
>>>>
>>>> At the same timestamp as the error from the master log (plus or minus a few seconds in case of clock mis-match), is there a message in the Collector log about refusing an attempt to send updates?
>>> yes basically the error you describe as puzzling appears in coincidence with an attempt of the windows node to access.
>>>
>>>>
>>>> This error
>>>>
>>>> 09/24/18 09:46:01 Query info: matched=6; skipped=4; query_time=0.000806; send_time=0.001738; type=Any; requirements={( ( ( MyType == "Scheduler" ) || ( MyType == "Submitter" ) ) || ( ( MyType == "Machine" ) ) )}; peer=<127.0.0.1:25381>; projection={}
>>>> 09/24/18 09:46:01 DaemonCore: Can't receive command request from 127.0.0.1 (perhaps a timeout?)
>>>>
>>>> is a bit more puzzling to me. I donât see how a request from a windows node to the collector could result in a peer address of 127.0.0.1
>>>>
>>>> Does the config on the Windows machine have this?
>>> this file c:\windows\system32\driver\etc\host does not contain 127.0.0.1 it contains just "10.6.10.15 mastercondorâ (I added this for convenience)
>>>>
>>>> NETWORK_INTERFACE = 127.0.0.1
>>>>
>>>> If so, remove that line.
>>>>
>>>> If not try running
>>>>
>>>> condor_config_val -write:upgrade config.log
>>> ok done attached
>>>>
>>>> and sending me the config.log file. Iâll see if I can see anything in that config that could cause the peer address to be set incorrectly.
>>>
>>> thank you very much for your help and support!
>>>
>>>>
>>>> -tj
>>>>
>>>> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Stefano Colafranceschi
>>>> Sent: Monday, September 24, 2018 12:12 PM
>>>> To: htcondor-users@xxxxxxxxxxx
>>>> Subject: [HTCondor-users] ERROR: SECMAN:2003:TCP connection to collector failed.
>>>>
>>>> Dear all,
>>>> I am trying to have a linux (latest) htcondor running with a windows node. On Linux I can submit jobs and they get processed no problems, but I canât figure out whatâs wrong adding a windows machine to the pool.
>>>>
>>>> This is the error that I see on the MasterLog (windows client):
>>>>
>>>> ERROR: SECMAN:2003:TCP connection to collector x.x.x.x failed.
>>>> Failed to start non-blocking update to <x.x.x.x:9618>.
>>>>
>>>> And this is the content of the Collectorlog on the linux server, just after I issued on the windows machine condor_status -master
>>>>
>>>> 09/24/18 09:46:01 Got QUERY_STARTD_PVT_ADS
>>>> 09/24/18 09:46:01 Number of Active Workers 0
>>>> 09/24/18 09:46:01 (Sending 4 ads in response to query)
>>>> 09/24/18 09:46:01 Query info: matched=4; skipped=0; query_time=0.000839; send_time=0.000619; type=MachinePrivate; requirements={true}; peer=<127.0.0.1:27363>; projection={}
>>>> 09/24/18 09:46:01 Number of Active Workers 0
>>>> 09/24/18 09:46:01 (Sending 6 ads in response to query)
>>>> 09/24/18 09:46:01 Query info: matched=6; skipped=4; query_time=0.000806; send_time=0.001738; type=Any; requirements={( ( ( MyType == "Scheduler" ) || ( MyType == "Submitter" ) ) || ( ( MyType == "Machine" ) ) )}; peer=<127.0.0.1:25381>; projection={}
>>>> 09/24/18 09:46:01 DaemonCore: Can't receive command request from 127.0.0.1 (perhaps a timeout?)
>>>>
>>>>
>>>> p.s. I am sure both windows and Linux have 9618 port open.
>>>>
>>>> Thanks for any suggestions!
>>>> _______________________________________________
>>>> HTCondor-users mailing list
>>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>>> subject: Unsubscribe
>>>> You can also unsubscribe by visiting
>>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>>
>>>> The archives can be found at:
>>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to
>>> htcondor-users-request@xxxxxxxxxxx
>>> with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>>
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>>
>>> The archives can be found at:
>>>
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>