Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Remove job without condor_rm

Date: Fri, 03 Mar 2023 22:22:28 +0000
From: "Anderson, Stuart B." <sba@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Remove job without condor_rm

> On Mar 3, 2023, at 1:44 PM, Cole Bollig via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
> 
> Hi Stuart,
> 
> I am glad that the removal of the single line stopped the infinite Schedd segmentation faults. It looks like the condor cron doesn't know how to handle a STEP value without a range (x-y) or asterisk i.e. [1/10]. Because of this invalid STEP value, the matchFields() appears to recursively run until a segmentation fault occurs. Sorry you had to stumble across this.

Cole,
	Do you have enough information to open a ticket (and tag it LIGO) or should I do that?

Have you determined if this problem exists in some (or all) 10.x releases as well?

And do you understand why this segfault did not generate a core file, or include a stack trace in the automatic condor daemon segfault notification email (perhaps due to blowing out the Linux process stack with unbounded recursive function calls)?

Are you open to considering the following two RFE tickets?

* Add support for condor_q, condor_hold and condor_rm to work on an offline queue.

* Add a knob for condor_master to start daemons under gdb.

These are motivated by feeling lucky that I was able to manually attach gdb to a schedd instance before it crashed, and wondering if I would need to replace /usr/sbin/condor_schedd with a script to start the schedd under gdb to get information on the crash, or that I might have to drop the entire queue to get the AP running again.

Thanks.

--
Stuart Anderson
sba@xxxxxxxxxxx

Follow-Ups:
- Re: [HTCondor-users] Remove job without condor_rm
  - From: Cole Bollig

References:
- [HTCondor-users] Remove job without condor_rm
  - From: Anderson, Stuart B.
- Re: [HTCondor-users] Remove job without condor_rm
  - From: Anderson, Stuart B.
- Re: [HTCondor-users] Remove job without condor_rm
  - From: Cole Bollig

Prev by Date: Re: [HTCondor-users] Error with the Docker universe
Next by Date: Re: [HTCondor-users] event log type 28 messages as supplement to each default message?
Previous by thread: Re: [HTCondor-users] Remove job without condor_rm
Next by thread: Re: [HTCondor-users] Remove job without condor_rm
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Remove job without condor_rm