This usually indicates that the condor_shadow saw the job complete, but was unable to write the Terminate event into the userlog of the job. Often this happens because the user deleted the directory that the log supposed to be written into.
periodic_remove will not remove these jobs, because it too would want to write into the log.
You should be able to use condor_rm and then condor_rm -forcex to remove them.
-tj
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Beyer, Christoph <christoph.beyer@xxxxxxx>
Sent: Friday, October 11, 2024 2:01 AM To: htcondor-users <htcondor-users@xxxxxxxxxxx> Subject: [HTCondor-users] jobs stay in queue forever with 'termination pending' Hi, our scheds get somehow clogged up with jobs that end up in job state 4 but do stay in 'termination pending == true' [root[root@mysched21 ~]# condor_q -constraint 'TerminationPending == true' -- Schedd: mysched21.desy.de : <123.456.789:23521?... @ 10/11/24 08:52:52 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS myuser ID: 696613 7/23 15:52 _ _ _ 1 696613.0 myuser ID: 696614 7/23 15:52 _ _ _ 1 696614.0 <snip> Total for query: 2874 jobs; 2874 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended As you can see these are aging quite well :( What causes theses jobs possibly to not being able to be finished and why does system-periodic-remove not finishes them at least ? Maybe it is related to cgroupsV2 and some cleanup there did not work as expected ? best christoph -- Christoph Beyer DESY Hamburg IT-Department Notkestr. 85 Building 02b, Room 009 22607 Hamburg phone:+49-(0)40-8998-2317 mail: christoph.beyer@xxxxxxx _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/ |