[HTCondor-devel] Help me understand some core dumps?


Date: Mon, 27 Jul 2015 16:41:04 -0400
From: Ben Cotton <ben.cotton@xxxxxxxxxxxxxxxxxx>
Subject: [HTCondor-devel] Help me understand some core dumps?
Hi team,

We've seen schedd core dumps at a customer site (HTCondor 8.2.7 on
64-bit CentOS 6). They've been running much shorter jobs than we had
originally planned for, so my suspicion is that part of the problem is
that the spool is on a persistent EBS volume instead of the
instance-local ephemeral disk.

Unfortunately, I don't have the logs. I've poked back at them to try
to get them but they may have rotated away by now. But I do have two
separate core dumps from two separate hosts that fail in the same
place.

I can provide the core files off-list, but here's what I was able to
find with gdb. Does it look familiar to anyone?

[New Thread 8150]
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols
from /usr/lib/debug/lib64/ld-2.12.so.debug...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `condor_schedd -f -local-name Q1'.
#0  0x00007fb69272249c in ?? ()
(gdb) bt
#0  0x00007fb69272249c in ?? ()
#1  0x0000000000737472 in PrioRecArray ()
#2  0x0000000000000031 in ?? () at
/slots/02/dir_42284/userdir/src/condor_utils/list.h:516
#3  0x00007fb69244b6d0 in ?? ()
#4  0x0000000002809cb0 in ?? ()
#5  0x0000000000000008 in ?? () at
/slots/02/dir_42284/userdir/src/condor_utils/list.h:288
#6  0x0000000000000000 in ?? ()
(gdb) frame 1
#1  0x0000000000737472 in PrioRecArray ()
(gdb) frame 2
#2  0x0000000000000031 in ?? () at
/slots/02/dir_42284/userdir/src/condor_utils/list.h:516
warning: Source file is more recent than executable.
516        item->next->prev = item->prev;
(gdb) fram 5
#5  0x0000000000000008 in ?? () at
/slots/02/dir_42284/userdir/src/condor_utils/list.h:288
288    List<ObjType>::~List()
(gdb)

-- 
Ben Cotton
main: 888.292.5320

Cycle Computing
Better Answers. Faster.

http://www.cyclecomputing.com
twitter: @cyclecomputing
[← Prev in Thread] Current Thread [Next in Thread→]
  • [HTCondor-devel] Help me understand some core dumps?, Ben Cotton <=