Dear HTCondor support,
We are developing REANA (Reproducible research data analysis platform) at CERN and currently, we are working on integrating it with CERN HTCondor cluster.Â
We have noticed some random crashing with an error message [1]. We tried increasing the verbose level [2] to get a more informative
error message [3].
We are submitting and monitoring jobs from REANA application component based on Debian 10[4] with condor 8.8.4 [5].Â
We have tried different version of HTCondor python bindings - 8.9.0, 8.9.1, 8.9.2.
Could you help us to understand what is causing this?
Best regards,
Rokas
[1] terminate called after throwing an instance of 'boost::python::error_already_set'
Aborted (core dumped)
[2]ÂÂhtcondor.set_subsystem("TOOL")Â
htcondor.param['TOOL_DEBUG'] = 'D_FULLDEBUG'
htcondor.param['TOOL_LOG'] = '/tmp/log'
htcondor.enable_log()
htcondor.enable_debug()Â
10/09/19 14:47:49 KERBEROS: input.enctype (18) and session.enctype (18)
10/09/19 14:47:49 condor_read(): Socket closed when trying to read 21 bytes from schedd at <
137.138.44.75:9618>
10/09/19 14:47:49 IO: EOF reading packet header
10/09/19 14:47:49 SharedPortClient: sent connection request to schedd at <
137.138.44.75:9618> for shared port id schedd_2873_37ae_40
terminate called after throwing an instance of 'boost::python::error_already_set'
Aborted (core dumped)
[5] root@9cc2253e86b9:/code# condor_version
$CondorVersion: 8.8.4 Jul 19 2019 BuildID: Debian-8.8.4-1 PackageID: 8.8.4-1 Debian-8.8.4-1 $
$CondorPlatform: X86_64-Debian_10 $