Re: [HTCondor-devel] HTCondor python bindings crashing


Date: Fri, 25 Oct 2019 14:45:12 +0200
From: Rokas Maciulaitis <rokas.maciulaitis@xxxxxxx>
Subject: Re: [HTCondor-devel] HTCondor python bindings crashing
Hi Brian,

I just managed to get the full backtrace. I pasted it in the following issue - https://github.com/reanahub/reana-job-controller/issues/190#issuecomment-546337518

Cheers,
Rokas

On Fri, 18 Oct 2019 at 16:45, Rokas Maciulaitis <rokas.maciulaitis@xxxxxxx> wrote:
Hi Brian,

The problem seems to be present in 8.9.3. Could you advise how to get a full traceback?Â

I've checked the core dump file, but it doesn't seem informative.

$ lldb -c core.1571404453.flask.16 (lldb) target create --core "core.1571404453.flask.16"Core file '/Users/rokas/project/reana/src/core.1571404453.flask.16' (x86_64) was loaded.thread info all

thread #1: tid = 88, 0x00007f3c69a015cb, name = 'flask', stop reason = signal SIGABRT

thread #2: tid = 16, 0x00007f3c699fd35b, stop reason = signal 0

thread #3: tid = 32, 0x00007f3c69792037, stop reason = signal 0



I've tried installingÂhtcondor-dbg, setting D_FULLDEBUG. None of those helped.


htcondor.set_subsystem("TOOL")

htcondor.param['TOOL_DEBUG'] = 'D_FULLDEBUG'

htcondor.param['TOOL_LOG'] = 'log.txt'

htcondor.enable_log()

htcondor.enable_debug()


Cheers,

Rokas


On Fri, 11 Oct 2019 at 16:57, Bockelman, Brian <BBockelman@xxxxxxxxxxxxx> wrote:
Hi Rokas,

There are a few known crashers around garbage collection - a number of which got fixed in 8.9.3. Unfortunately, they appear to trigger more often on python3 than python2 -- it's actively being worked on.

Is it possible to get a traceback? With that, we can advise whether:
a) This is fixed in the next release,
b) This is a known bug we are actively working on, or
c) This is a new bug.

If it's a known / unknown bug, we can increase its priority (in fact, my calendar informs me that I'm supposed to help Carl Edquist solve some of these issues this afternoon! So, it's a good day for this report...).

Brian

On Oct 11, 2019, at 2:23 AM, Rokas Maciulaitis via HTCondor-devel <htcondor-devel@xxxxxxxxxxx> wrote:

Dear HTCondor support,

We are developing REANA (Reproducible research data analysis platform) at CERN and currently, we are working on integrating it with CERN HTCondor cluster.Â

We have noticed some random crashing with an error message [1]. We tried increasing the verbose level [2] to get a more informative error message [3].

We are submitting and monitoring jobs from REANA application component based on Debian 10[4] with condor 8.8.4 [5].Â
We have tried different version of HTCondor python bindings - 8.9.0, 8.9.1, 8.9.2.

Could you help us to understand what is causing this?

Best regards,
Rokas

[1] terminate called after throwing an instance of 'boost::python::error_already_set'
Aborted (core dumped)

[2]ÂÂhtcondor.set_subsystem("TOOL")Â
htcondor.param['TOOL_DEBUG'] = 'D_FULLDEBUG'
htcondor.param['TOOL_LOG'] = '/tmp/log'
htcondor.enable_log()
htcondor.enable_debug()Â

[3]Â10/09/19 14:47:49 init_user: post mcreds->client is 'rmaciula@xxxxxxx'
10/09/19 14:47:49 init_user: post mcreds->server is 'host/bigbird14.cern.ch@xxxxxxx'
10/09/19 14:47:49 init_user: post creds_->client is 'rmaciula@xxxxxxx'
10/09/19 14:47:49 init_user: post creds_->server is 'host/bigbird14.cern.ch@xxxxxxx'
10/09/19 14:47:49 KERBEROS: creds_->client is 'rmaciula@xxxxxxx'
10/09/19 14:47:49 KERBEROS: creds_->server is 'host/bigbird14.cern.ch@xxxxxxx'
10/09/19 14:47:49 KERBEROS: input.enctype (18) and session.enctype (18)
10/09/19 14:47:49 condor_read(): Socket closed when trying to read 21 bytes from schedd at <137.138.44.75:9618>
10/09/19 14:47:49 IO: EOF reading packet header
10/09/19 14:47:49 SharedPortClient: sent connection request to schedd at <137.138.44.75:9618> for shared port id schedd_2873_37ae_40
terminate called after throwing an instance of 'boost::python::error_already_set'
Aborted (core dumped)


[5] root@9cc2253e86b9:/code# condor_version
$CondorVersion: 8.8.4 Jul 19 2019 BuildID: Debian-8.8.4-1 PackageID: 8.8.4-1 Debian-8.8.4-1 $
$CondorPlatform: X86_64-Debian_10 $
_______________________________________________
HTCondor-devel mailing list
HTCondor-devel@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel

[← Prev in Thread] Current Thread [Next in Thread→]