[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Remove job without condor_rm



Hi Stuart,

I am glad that the removal of the single line stopped the infinite Schedd segmentation faults. It looks like the condor cron doesn't know how to handle a STEP value without a range (x-y) or asterisk i.e. [1/10]. Because of this invalid STEP value, the matchFields() appears to recursively run until a segmentation fault occurs. Sorry you had to stumble across this.

-Cole Bollig

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Anderson, Stuart B. <sba@xxxxxxxxxxx>
Sent: Thursday, March 2, 2023 7:55 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Remove job without condor_rm
 

> On Mar 2, 2023, at 5:38 PM, Anderson, Stuart B. <sba@xxxxxxxxxxx> wrote:
>
> Does anyone know how to remove a job from a schedd queue while condor_schedd is not running?
>
> I have tracked down a crondor job that is segfaulting condor_schedd at startup with a 64k deep stack trace of calls to CronTab::matchFields, so condor_rm is not an option. I have a pretty good guess of the offending jobid, but I am not sure if I should just just manually edit job_queue.log to remove any line where column 2 contains the suspect job id before restarting condor, or if there is additional state to manually update?

I decided to not try and manually remove an entire job without confirmation from an expert, but removing the following line from job_queue.log has allowed condor_schedd version 9.0.7 to run again on an EL8 system,

103 0115248510.-1 CronMinute "1/10"

Thanks.

--
Stuart Anderson
sba@xxxxxxxxxxx




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/