Re: [HTCondor-devel] 8.9.8 master getting into an infinite loop on startup


Date: Tue, 21 Jul 2020 17:40:32 +0000
From: Zach Miller <zmiller@xxxxxxxxxxx>
Subject: Re: [HTCondor-devel] 8.9.8 master getting into an infinite loop on startup
Err, sorry if I used confusing terminology.

When I said "exec() process" I was trying to say "forked process that is trying to do the exec()".  Somewhere in there something is misbehaving / blocking / looping unexpectedly.


Cheers,
-zach


ïOn 7/21/20, 12:36 PM, "HTCondor-devel on behalf of Zach Miller via HTCondor-devel" <htcondor-devel-bounces@xxxxxxxxxxx on behalf of htcondor-devel@xxxxxxxxxxx> wrote:

    And if you turn off shared port, I would bet it hangs trying to start the next daemon.  (So clearly, just run the master with no shared port, no procd, and no other daemons running.  Problem solved! : )

    Sounds like something is wonky with the exec() process and getting the status from the newly execed process.  I'd follow Brian's "pstack" suggestion to see what's going on.


    Cheers,
    -zach


    On 7/21/20, 12:33 PM, "MÃtyÃs Selmeci via HTCondor-devel" <htcondor-devel@xxxxxxxxxxx> wrote:

        With the procd disabled, it hangs trying to start the shared port daemon.
        -Mat

        On 7/21/20 12:24 PM, Gregory Thain via HTCondor-devel wrote:



        Just for debugging purposes,does it work with USE_PROCD = false ?

        -greg

        On 7/21/20 12:05 PM, MÃtyÃs Selmeci via HTCondor-devel wrote:

        Hey folks,

        I've got a problem running 8.9.8 on my Fedora 32 laptop (I'm using an
        RPM Tim gave me from an NMI build): when I start condor, the master
        forks and the child master gets into an infinite loop, eating an entire
        CPU and not responding to SIGTERM.  The last line in the MasterLog is:

        07/21/20 11:46:56 (fd:1) (pid:233863) (D_DAEMONCORE) About to exec "/usr/sbin/condor_procd"

        SELinux is off.  I attached my MasterLog with D_ALL:2 and
        condor_config_val -summary (that feature's great).  The traceback
        at the end of MasterLog is me killing sending SIGABRT to both
        condor_master processes.

        Any ideas?

        Thanks,
        -Mat

         _______________________________________________
        HTCondor-devel mailing list
        HTCondor-devel@xxxxxxxxxxxxxxxx://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel


         _______________________________________________
        HTCondor-devel mailing list
        HTCondor-devel@xxxxxxxxxxxxxxxx://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel


    _______________________________________________
    HTCondor-devel mailing list
    HTCondor-devel@xxxxxxxxxxx
    https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel


[← Prev in Thread] Current Thread [Next in Thread→]