Hi Justin,
It seems like the EP Startd is failing to authenticate to the collector when sending slot ads which would explain why condor_status is not showing the EP since the collector has no ads for that machine. I would add D_SECURITY or D_SECURITY:2 to the debugging
level for the collector on the central manager node and the startd of this EP. You could also try running: _condor_TOOL_DEBUG="D_SECURITY" condor_status -debug -direct <hostname>
-Cole Bollig
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Justin Killebrew via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Friday, August 18, 2023 2:42 PM To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> Cc: Justin Killebrew <jk@xxxxxxx> Subject: Re: [HTCondor-users] StartLog: Failed to authenticate I meant to include the execute node condor_who -daemon:
Daemon Alive PID PPID Exit ------ ----- --- ---- ---- Master yes 7570 1 no SharedPort no 7604 no no Startd yes 7605 7570 no JK > On Aug 18, 2023, at 3:38 PM, Justin Killebrew via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote: > > > External Email - Use Caution > > > > condor_who -daemons on the central manager (also configured as submit role) shows: > > Daemon Alive PID PPID Exit > ------ ----- --- ---- ---- > Collector yes 1608 1494 no > Master yes 1494 1 no > Negotiator yes 1609 1494 no > Schedd yes 1610 1494 no > SharedPort yes 1607 1494 no > > This looks correct but on the execute machine, StartLog has several > ERROR: AUTHENTICATE:1003:Failed to authenticate with any method > and > SECMAN: required authentication with collector failed > > The central manager CollectorLog shows similar errors: > DC_AUTHENTICATE: required authentication of 192.168.1.5 failed > > The firewall isn’t active … Where else should I look? > > condor_status returns nothing on the central manager. Is this because it doesn’t see any execute machines? > > > Thanks, > JK > > > >> On Aug 17, 2023, at 12:28 PM, John M Knoeller <johnkn@xxxxxxxxxxx> wrote: >> >> >> External Email - Use Caution >> >> >> >> One way to troubleshoot is to run >> >> condor_who -daemons >> >> On the execute node. This tool scrapes log files to determine which daemons are alive and which are not. >> >> If the condor_master is running, then you can use >> >> condor_who -quick >> >> which sends a query to the condor_master about the state of the other daemons. >> >> -tj >> >> -----Original Message----- >> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Justin Killebrew via HTCondor-users >> Sent: Friday, August 11, 2023 3:03 PM >> To: Todd L Miller <tlmiller@xxxxxxxxxxx> >> Cc: Justin Killebrew <jk@xxxxxxx>; Justin Killebrew via HTCondor-users <htcondor-users@xxxxxxxxxxx> >> Subject: Re: [HTCondor-users] condor_status returns nothing >> >> The StartLog showed that /var/lib/condor/execute didn’t exist. I created it and restarted condor and now condor_status works as expected. >> >> Thanks! >> >> JK >> >> >>> On Aug 11, 2023, at 3:47 PM, Todd L Miller <tlmiller@xxxxxxxxxxx> wrote: >>> >>> >>> External Email - Use Caution >>> >>> >>> >>>> Should there be a startd running? How do I troubleshoot this installation? >>> >>> Yes. First thing to do is look at the MasterLog and StartLog >>> files (which will probably be in /var/log/condor, but you can run >>> `condor_config_val LOG` to find out for sure). From your process tree, it >>> looks like either the master isn't starting the startd or the startd is >>> crashing (almost?) immediately on start-up. >>> >>> - ToddM >> >> >> _______________________________________________ >> HTCondor-users mailing list >> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a >> subject: Unsubscribe >> You can also unsubscribe by visiting >> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users >> >> The archives can be found at: >> https://lists.cs.wisc.edu/archive/htcondor-users/ > > > _______________________________________________ > HTCondor-users mailing list > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/htcondor-users/ _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/ |