[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] jobs stay in queue forever with 'termination pending'



This usually indicates that the condor_shadow saw the job complete, but was unable to write the Terminate event into the userlog of the job.    Often this happens because the user deleted the directory that the log supposed to be written into. 

periodic_remove will not remove these jobs, because it too would want to write into the log.  

You should be able to use condor_rm  and then condor_rm -forcex  to remove them. 

-tj

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Beyer, Christoph <christoph.beyer@xxxxxxx>
Sent: Friday, October 11, 2024 2:01 AM
To: htcondor-users <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] jobs stay in queue forever with 'termination pending'
 

Hi,

our scheds get somehow clogged up with jobs that end up in job state 4 but do stay in 'termination pending == true'

[root[root@mysched21 ~]# condor_q -constraint 'TerminationPending == true'

-- Schedd: mysched21.desy.de : <123.456.789:23521?... @ 10/11/24 08:52:52
OWNER    BATCH_NAME     SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
myuser ID: 696613    7/23 15:52      _      _      _      1 696613.0
myuser ID: 696614    7/23 15:52      _      _      _      1 696614.0
<snip>

Total for query: 2874 jobs; 2874 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended

As you can see these are aging quite well :(

What causes theses jobs possibly to not being able to be finished and why does system-periodic-remove not finishes them at least ?

Maybe it is related to cgroupsV2 and some cleanup there did not work as expected ?

best
christoph

--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/