[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Parallel Universe (DedicatedScheduler): Segfaults, SIGABRTs, and Assertions, oh my!



Thanks much as always, Greg!

I noticed the pull requests in GitHub for 23.0 and 23.x, and I'm guessing the 24.0 and 24.x will land soon.

I'll gladly switch the repo branch away from release whenever the next build lands to help test.

-Zach

________________________________________
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Greg Thain via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Thursday, May 1, 2025 11:29 AM
To: htcondor-users@xxxxxxxxxxx
Cc: Greg Thain
Subject: Re: [HTCondor-users] Parallel Universe (DedicatedScheduler): Segfaults, SIGABRTs, and Assertions, oh my!


On 4/29/25 4:18 PM, Zach McGrew wrote:
> Thanks for the help and feedback Todd (and Team),
>
> Here's some additional details that may be helpful:
>
> All of the systems (production and my test vms) are running Rocky Linux 9.5 on x86_64. AP and EPs have been upgraded to HTCondor 24.0.7.
>
> The production EPs are dual-cpu Xeon Gold 6130's (64 threads that get treated as 64 CPUs for HTCondor). The production AP is an older system,  Xeon E5-2620. Not seeing any ECC errors or hardware check exceptions on EPs or AP.

Hi Zach:

Thanks for the details -- we've pushed some fixes that will address some
of these problems, but perhaps not all of them. These changes should be
in the next release of HTCondor, or if you like, we could get you a
pre-release to test with.

Sorry for the headaches,

-greg

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

Join us in June at Throughput Computing 25: https://osg-htc.org/htc25

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/