Just for Records sake problem resolved, I added:
Thanks for the help on the other errors! -Brandon On 1/8/18 3:09 PM, Brandon Graves
wrote:
Alright, I now Have Central Manager = MASTER, COLLECTOR, NEGOTIATOR Submit Nodes = MASTER, SCHEDD Execute nodes = MASTER, SCHEDD, STARTD That has fixed the error in the logs, but "condor_q -global" still presents:Failed to fetch ads from:<xxx.xxx.xxx.xxx:9618?addrs=xxx.xxx.xxx.xxx-9618&noUDP&sock=3573919_bf24_3> : host.my.domain.comAUTHENTICATE:1003:Failed to authenticate with any method AUTHENTICATE:1004:Failed to authenticate using GSI GSI:5003:Failed to authenticate. Globus is reporting error(851968:50). There is probably a problem with your credentials. (Did you run grid-proxy-init?)AUTHENTICATE:1004:Failed to authenticate using KERBEROS AUTHENTICATE:1004:Failed to authenticate using FSThank's again for your help, if you have any more idea's I'd be very appreciative --Brandon On 1/8/18 1:16 PM, John M Knoeller wrote:This message.01/08/18 09:59:56 DaemonCore: Can't receive command request fromxxx.xxx.xxx.105 (perhaps a timeout?) Is generally not a problem. You will see it in working pools once per negotiation cycle, it happens because the negotiator hangs up after updating accounting ads, but the collector assumes that any socket opened for updates will never be closed, so it looks for a second command after the first and we get a warning when It doesn't find one. This message.01/08/18 10:03:14 PERMISSION DENIED to condor_pool@xxxxxxxxxxxxx fromhost xxx.xxx.xxx.60 for command 10 (QUERY_STARTD_PVT_ADS), access level NEGOTIATOR: reason: cached result for NEGOTIATOR; see first case for the full reason Is a problem. It indicates that host xxx.xxx.xxx.60 is running a NEGOTIATOR, and that negotiator is unable to negotiate because the COLLECTOR will not send it the information it needs to do so. if xxx.xxx.xxx.60 is not your central manager, then you probably just need to remove NEGOATIATOR from the DAEMON_LIST in the configuration of host xxx.xxx.xxx.60 try running condor_config_val -verbose DAEMON_LIST on each of your submit nodes. the result should not have either COLLECTOR or NEGOTIATOR. -tj -----Original Message----- From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Brandon Graves Sent: Monday, January 8, 2018 12:18 PM To: htcondor-users@xxxxxxxxxxx Subject: Re: [HTCondor-users] Error with global Queue Both commands yield valid results without errors. in the Central Managers CollectorLog I have:01/08/18 10:03:14 PERMISSION DENIED to condor_pool@xxxxxxxxxxxxx fromhost xxx.xxx.xxx.60 for command 10 (QUERY_STARTD_PVT_ADS), access level NEGOTIATOR: reason: cached result for NEGOTIATOR; see first case for the full reason01/08/18 10:03:14 DC_AUTHENTICATE: Command not authorized, done! 01/08/18 10:03:20 Got QUERY_STARTD_ADS 01/08/18 10:03:20 Number of Active Workers 0 01/08/18 10:03:20 Got QUERY_STARTD_ADS 01/08/18 10:03:20 Number of Active Workers 0 01/08/18 10:03:26 Got QUERY_STARTD_PVT_ADS 01/08/18 10:03:26 Number of Active Workers 0 01/08/18 10:03:26 Number of Active Workers 0 01/08/18 10:03:26 DaemonCore: Can't receive command request fromxxx.xxx.xxx.105 (perhaps a timeout?) xxx.60 is one of my submit nodes, and xxx.105 is the central manager. There is also a similar entry for other nodes. I looked through for logs with a bit more detail and got:01/08/18 09:59:56 DaemonCore: Can't receive command request fromxxx.xxx.xxx.105 (perhaps a timeout?)01/08/18 09:59:56 PERMISSION DENIED to condor_pool@xxxxxxxxxxxxxxxxxxxfrom host xxx.xxx.xxx.52 for command 10 (QUERY_STARTD_PVT_ADS), access level NEGOTIATOR: reason: cached result for NEGOTIATOR; see first case for the full reason01/08/18 09:59:56 DC_AUTHENTICATE: Command not authorized, done!Thank you for any further insight you can provide! -Brandon On 1/8/18 9:46 AM, John M Knoeller wrote:I think this means that condor_q is unable to fetch schedd ads from the collector. Try running condor_status -schedd do you get the same error? does a simple condor_status work? If you look in the CollectorLog on the central manager, do you see any messages about the rejected query? -----Original Message----- From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Brandon Graves Sent: Monday, January 8, 2018 11:26 AM To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> Subject: [HTCondor-users] Error with global Queue Hello All, I recently replaced my Central Manager, and a few odd things have come up. The only definite error message I can find though happens when "condor_q -global" is run:-- Failed to fetch ads from:<xxx.xxx.xxx.49:9618?addrs=xxx.xxx.xxx.49-9618+[> : server1.my.domain.comAUTHENTICATE:1003:Failed to authenticate with any method AUTHENTICATE:1004:Failed to authenticate using GSI GSI:5003:Failed to authenticate. Globus is reporting error(851968:50). There is probably a problem with your credentials. (Did you run grid-proxy-init?)AUTHENTICATE:1004:Failed to authenticate using KERBEROS AUTHENTICATE:1004:Failed to authenticate using FSMy basic configuration is Central manager, connected to 2 submit nodes. Each submit node seems to be able to see it's own queue, one of the submit nodes off and on seems to be having trouble running jobs, but I can't seem to find any errors that make sense. For now I'd like to figure out the global queue error as I suspect they are related. My config file as far as authentication goes looks like this:SEC_PASSWORD_FILE = /etc/condor/pool_password SEC_DAEMON_AUTHENTICATION = REQUIRED SEC_DAEMON_INTEGRITY = REQUIRED SEC_DAEMON_AUTHENTICATION_METHODS = PASSWORD SEC_NEGOTIATOR_AUTHENTICATION = REQUIRED SEC_NEGOTIATOR_INTEGRITY = REQUIRED SEC_NEGOTIATOR_AUTHENTICATION_METHODS = PASSWORD SEC_CLIENT_AUTHENTICATION_METHODS = FS, PASSWORD, KERBEROS, GSI( I didn't do the initial install/configuration of HTcondor on these systems, I'm just the new admin for them, and still getting my footing) I've looked through some of the logs, but I can't seem to find any specific error messages that point me in a new direction. Any tips/tricks/idea's would be appreciated --Brandon _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/ _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/ _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/ |