[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] dagman Caught signal 11



Hi,
As part of WLCG migration away from GSI we upgraded condor from 9.0 toÂcondor-10.9.0-1.el7.x86_64.
Since the upgrade, dag jobs started failing with Caught signal 11, in random stages.


03/05/24 16:43:29 Submitting node 1_rRNA.condor from file 1_rRNA.condor using direct job submission
Caught signal 11: si_code=1, si_pid=4294967280, si_uid=0, si_addr=0xFFFFFFF0
Stack dump for process 3749879 at timestamp 1709649809 (14 frames)
/lib64/libcondor_utils_10_9_0.so(_Z18dprintf_dump_stackv+0x25)[0x7f0cb6c3d595]
/lib64/libcondor_utils_10_9_0.so(_Z17unix_sig_coredumpiP9siginfo_tPv+0x68)[0x7f0cb6e3f168]
/lib64/libpthread.so.0(+0xf630)[0x7f0cb5082630]
/lib64/libc.so.6(+0x16f8e5)[0x7f0cb4e148e5]
condor_scheduniv_exec.1729273.0(_ZN19SubmitStepFromQArgs12next_rowdataEv+0x161)[0x436df1]
condor_scheduniv_exec.1729273.0(_Z20direct_condor_submitRK6DagmanP3JobPKcRKSsS5_S5_R8CondorID+0xbfc)[0x434f0c]
condor_scheduniv_exec.1729273.0(_ZN3Dag13SubmitNodeJobERK6DagmanP3JobR8CondorID+0x1a9)[0x41fc19]
condor_scheduniv_exec.1729273.0(_ZN3Dag15SubmitReadyJobsERK6Dagman+0x230)[0x420500]
condor_scheduniv_exec.1729273.0(_Z18condor_event_timerv+0xf2)[0x428462]
/lib64/libcondor_utils_10_9_0.so(_ZN12TimerManager7TimeoutEPiPd+0x473)[0x7f0cb6e609a3]
/lib64/libcondor_utils_10_9_0.so(_ZN10DaemonCore6DriverEv+0x25f)[0x7f0cb6e2737f]
/lib64/libcondor_utils_10_9_0.so(_Z7dc_mainiPPc+0x18f4)[0x7f0cb6e4bb84]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f0cb4cc7555]
condor_scheduniv_exec.1729273.0[0x411874]