Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] centrally force removal after some time even if leave_in_queue is true?
- Date: Wed, 7 Nov 2018 23:18:20 +0000
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] centrally force removal after some time even if leave_in_queue is true?
On 11/7/2018 10:38 AM, Andrea Sartirana wrote:
> Hi Todd,
>
> your solution seems to work as the LeavJobInQueue classadd is changed
> [1] and correctly evaluates to false
> when some expiration time has passed [2]. But indeed, as Michael said,
> it does not really fix my problem
> since the jobs are not removed from the queue (in the sense that they
> still appear in condor_q output).
> Is this because something is not well configured on our schedd?
> If not I guess only a cron running "condor_rm -xforce ..." can fix the
> issue...
>
> (anyways, job-transform seems indeed very powerful)
>
> Regards,
> Andrea
>
Hi Andrea,
Ugh, what you observe above is currently correct. It is a bug, thank
you for reporting it.
Things work properly for jobs that *complete*, but it turns out thereâs
a bug when LeaveJobInQueue evaluates to True for *removed* jobs. The
removed jobs stay in the queue even when the expression later evaluates
to False.
The good news is we fixed this bug for the upcoming v8.8.0 release.
Details are in this ticket:
https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6808
In the meantime, as you surmised, an potential immediate work-around
would be to run 'condor_rm -all -forcex' periodically. This causes all
jobs that are already in X state to be immediately removed from the
queue, ignoring any conditions that would normally keep them in the
queue (like LeaveJobInQueue). It leaves jobs in other state alone.
regards,
Todd