[HTCondor-devel] Help me understand some core dumps?

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Date:	Mon, 27 Jul 2015 16:41:04 -0400
From:	Ben Cotton <ben.cotton@xxxxxxxxxxxxxxxxxx>
Subject:	[HTCondor-devel] Help me understand some core dumps?

Hi team,

We've seen schedd core dumps at a customer site (HTCondor 8.2.7 on
64-bit CentOS 6). They've been running much shorter jobs than we had
originally planned for, so my suspicion is that part of the problem is
that the spool is on a persistent EBS volume instead of the
instance-local ephemeral disk.

Unfortunately, I don't have the logs. I've poked back at them to try
to get them but they may have rotated away by now. But I do have two
separate core dumps from two separate hosts that fail in the same
place.

I can provide the core files off-list, but here's what I was able to
find with gdb. Does it look familiar to anyone?

[New Thread 8150]
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols
from /usr/lib/debug/lib64/ld-2.12.so.debug...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `condor_schedd -f -local-name Q1'.
#0  0x00007fb69272249c in ?? ()
(gdb) bt
#0  0x00007fb69272249c in ?? ()
#1  0x0000000000737472 in PrioRecArray ()
#2  0x0000000000000031 in ?? () at
/slots/02/dir_42284/userdir/src/condor_utils/list.h:516
#3  0x00007fb69244b6d0 in ?? ()
#4  0x0000000002809cb0 in ?? ()
#5  0x0000000000000008 in ?? () at
/slots/02/dir_42284/userdir/src/condor_utils/list.h:288
#6  0x0000000000000000 in ?? ()
(gdb) frame 1
#1  0x0000000000737472 in PrioRecArray ()
(gdb) frame 2
#2  0x0000000000000031 in ?? () at
/slots/02/dir_42284/userdir/src/condor_utils/list.h:516
warning: Source file is more recent than executable.
516        item->next->prev = item->prev;
(gdb) fram 5
#5  0x0000000000000008 in ?? () at
/slots/02/dir_42284/userdir/src/condor_utils/list.h:288
288    List<ObjType>::~List()
(gdb)

-- 
Ben Cotton
main: 888.292.5320

Cycle Computing
Better Answers. Faster.

http://www.cyclecomputing.com
twitter: @cyclecomputing

[← Prev in Thread]	Current Thread	[Next in Thread→]
[HTCondor-devel] Help me understand some core dumps?, Ben Cotton <=

Previous by Date:	[HTCondor-devel] condor-8.3.6 FTBFS in Fedora, Ben Cotton
Next by Date:	[HTCondor-devel] Condor does not build against globus-gsi-credential 7.9, Adam Williamson
Previous by Thread:	Re: [HTCondor-devel] Condor does not build against globus-gsi-credential 7.9, Adam Williamson
Next by Thread:	[HTCondor-devel] Priority factors keep getting reset, Bryan Wright
Indexes:	[Date] [Thread]

Mailing List Archives

Authenticated access

[HTCondor-devel] Help me understand some core dumps?