Dear all, in order to supplement Rene's email, I would like to share the current assumptions and investigations. It looks like that the Centos 8 packages are build using -D _GLIBCXX_ASSERTIONS, is that correct? If that is the case, the code in [1] seems to be broken. * Line 101 will only reserve contiguous memory for the vector without changing its size. * In case -D _GLIBCXX_ASSERTIONS is used, it will check its size while accessing it in Line 102 * Since its size is still zero, that would cause a SIGABRT. Best regards, Manuel [1] https://github.com/htcondor/htcondor/blob/397ce7a3488d7b4e41168b0d039b19468138eeea/src/condor_utils/token_utils.cpp#L101-L102 Dr. Manuel Giffels, Karlsruhe Institute of Technology (KIT), Steinbuch Centre for Computing (SCC) Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen Phone: +49 721 608 28636, Email: Manuel.Giffels@xxxxxxx > Am 27.04.2021 um 11:30 schrieb Caspart, Renà (SCC) <rene.caspart@xxxxxxx>: > > Dear all, > > When upgrading one of our systems running RHEL 8 to HTCondor 9.0.0 > (previously we were running 8.9.11 without any problems) we encountered > the condor_master terminating after receiving a SIGABRT. Based on the > logs [1] this seems to be related to using token authentication and > condor trying to request a token from the host running the collector. > The machine where we saw this behavior is a worker node running a STARTD > and having access to a token with ADVERTISE_STARTD permissions. > > We were able to reproduce this behavior on a test machine (here we were > only able to use CentOS 8 not RHEL 8) and were able to trace it down to > the following backtrace [2], which points to [3] as the place in > HTCondor where the abort is triggered. > > Since this does not seem to related to the specific setup of our hosts, > has anyone encountered a similar issue? > > Thanks, > Rene > > > [1] > 04/26/21 12:15:00 (pid:1293) (D_SECURITY) Trying token request to remote > host cloud-htcondor.gridka.de for user (default). > Caught signal 6: si_code=4294967290, si_pid=1293, si_uid=232883, > si_addr=0x50D > Stack dump for process 1293 at timestamp 1619432100 (13 frames) > /hkfs/home/project/hk-project-test-hep/scc-sdm-hep-0001/software/condor/condor-9.0.0-1-x86_64_CentOS8-stripped/usr/sbin/../lib64/libcondor_utils_9_0_0.so(dprintf_dump_stack+0x28)[0x147ceb85baf8] > /hkfs/home/project/hk-project-test-hep/scc-sdm-hep-0001/software/condor/condor-9.0.0-1-x86_64_CentOS8-stripped/usr/sbin/../lib64/libcondor_utils_9_0_0.so(_Z17unix_sig_coredumpiP9siginfo_tPv+0x6d)[0x147ceba8686d] > /lib64/libpthread.so.0(+0x12dd0)[0x147ce9928dd0] > /lib64/libc.so.6(gsignal+0x10f)[0x147ce958b70f] > /lib64/libc.so.6(abort+0x127)[0x147ce9575b25] > /hkfs/home/project/hk-project-test-hep/scc-sdm-hep-0001/software/condor/condor-9.0.0-1-x86_64_CentOS8-stripped/usr/sbin/../lib64/libcondor_utils_9_0_0.so(_ZN8htcondor18generate_client_idB5cxx11Ev+0x87)[0x147ceb998157] > /hkfs/home/project/hk-project-test-hep/scc-sdm-hep-0001/software/condor/condor-9.0.0-1-x86_64_CentOS8-stripped/usr/sbin/../lib64/libcondor_utils_9_0_0.so(+0x3d10b8)[0x147ceba8d0b8] > /hkfs/home/project/hk-project-test-hep/scc-sdm-hep-0001/software/condor/condor-9.0.0-1-x86_64_CentOS8-stripped/usr/sbin/../lib64/libcondor_utils_9_0_0.so(+0x3d18af)[0x147ceba8d8af] > /hkfs/home/project/hk-project-test-hep/scc-sdm-hep-0001/software/condor/condor-9.0.0-1-x86_64_CentOS8-stripped/usr/sbin/../lib64/libcondor_utils_9_0_0.so(_ZN12TimerManager7TimeoutEPiPd+0x3a3)[0x147cebaa1f13] > /hkfs/home/project/hk-project-test-hep/scc-sdm-hep-0001/software/condor/condor-9.0.0-1-x86_64_CentOS8-stripped/usr/sbin/../lib64/libcondor_utils_9_0_0.so(_ZN10DaemonCore6DriverEv+0x788)[0x147ceba72ed8] > /hkfs/home/project/hk-project-test-hep/scc-sdm-hep-0001/software/condor/condor-9.0.0-1-x86_64_CentOS8-stripped/usr/sbin/../lib64/libcondor_utils_9_0_0.so(_Z7dc_mainiPPc+0x1890)[0x147ceba8b4d0] > /lib64/libc.so.6(__libc_start_main+0xf3)[0x147ce95776a3] > condor_master(_start+0x2e)[0x558f85cedb4e] > > [2] > #0 0x00007ffff557499f in raise () from /usr/lib64/libc.so.6 > #1 0x00007ffff555ecf5 in abort () from /usr/lib64/libc.so.6 > #2 0x00007ffff7979157 in std::__replacement_assert > (__condition=0x7ffff7a965a8 "__builtin_expect(__n < this->size(), > true)", __function=<synthetic pointer>, __line=932, > __file=0x7ffff7a965d8 "/usr/include/c++/8/bits/stl_vector.h") at > /usr/include/c++/8/x86_64-redhat-linux/bits/c++config.h:2391 > #3 std::vector<char, std::allocator<char> >::operator[] (__n=0, > this=<synthetic pointer>) at /usr/include/c++/8/bits/stl_vector.h:932 > #4 htcondor::generate_client_id[abi:cxx11]() () at > /usr/src/debug/condor-9.0.0-1.el8.x86_64/src/condor_utils/token_utils.cpp:102 > #5 0x00007ffff7a6e0b8 in (anonymous > namespace)::TokenRequest::tryTokenRequest (req=...) at > /usr/src/debug/condor-9.0.0-1.el8.x86_64/src/condor_daemon_core.V6/daemon_core_main.cpp:462 > #6 0x00007ffff7a6e8af in (anonymous > namespace)::TokenRequest::tryTokenRequests () at > /usr/src/debug/condor-9.0.0-1.el8.x86_64/src/condor_daemon_core.V6/daemon_core_main.cpp:422 > #7 0x00007ffff7a82f13 in TimerManager::Timeout (this=0x55555579f290, > pNumFired=pNumFired@entry=0x7fffffffdbf4, > pruntime=pruntime@entry=0x7fffffffdbf8) > at > /usr/src/debug/condor-9.0.0-1.el8.x86_64/src/condor_daemon_core.V6/timer_manager.cpp:473 > #8 0x00007ffff7a53ed8 in DaemonCore::Driver (this=0x5555557a03b0) at > /usr/src/debug/condor-9.0.0-1.el8.x86_64/src/condor_daemon_core.V6/daemon_core.cpp:3513 > #9 0x00007ffff7a6c4d0 in dc_main (argc=1, argv=<optimized out>) at > /usr/src/debug/condor-9.0.0-1.el8.x86_64/src/condor_daemon_core.V6/daemon_core_main.cpp:4386 > #10 0x00007ffff5560873 in __libc_start_main () from /usr/lib64/libc.so.6 > #11 0x0000555555560b4e in _start () at > /usr/src/debug/condor-9.0.0-1.el8.x86_64/src/condor_utils/dc_service.h:70 > > [3] > https://github.com/htcondor/htcondor/blob/V9_0_0/src/condor_utils/token_utils.cpp#L102 > > -- > Karlsruher Institut fÃr Technologie (KIT) > Steinbuch Centre for Computing (SCC) > > Dr. Renà Caspart > > Hermann-von-Helmholtz-Platz 1 > 76344 Eggenstein-Leopoldshafen, Germany > Telefon: +49 721 608-25631 > E-mail: Rene.Caspart@xxxxxxx > > > Sitz der KÃrperschaft: > KaiserstraÃe 12, 76131 Karlsruhe > > > > KIT â Die ForschungsuniversitÃt in der Helmholtz-Gemeinschaft > > > _______________________________________________ > HTCondor-users mailing list > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/htcondor-users/
Attachment:
smime.p7s
Description: S/MIME cryptographic signature