Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] A lot of jobs in C state
- Date: Tue, 05 Sep 2017 17:04:26 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] A lot of jobs in C state
On 9/4/2017 11:06 AM, Carles Acosta wrote:
Hello again,
It seems that the issue was related withÂJOB_IS_FINISHED_INTERVAL. It
was set at 10 seconds, but the jobs stayed for much longer as I've
commented in my previous email. Removing JOB_IS_FINISHED_INTERVAL from
the Schedd config, all seems to work correctly again. There are no more
"actOnJobs: didn't do any work, aborting" messages in the Schedd for the
last 7 hours.
I don't know if I misunderstand JOB_IS_FINISHED_INTERVAL macro. We're
running HTCondor 8.6.5 and there were no issues related with
JOB_IS_FINISHED_INTERVAL with previous versions, such as the 8.5.8, as
far as we know.
Thank you very much.
Cheers,
Carles
Hi Carles,
Thanks for the follow-up post above.
You stated you had problems when your config had
JOB_IS_FINISHED_INTERVAL=10
What was JOB_IS_FINISHED_COUNT set to be?
If you change JOB_IS_FINISHED_INTERVAL to be 10, and don't also set
JOB_IS_FINISHED_COUNT, the result is the schedd will only allow one job
to leave the queue every 10 seconds!! I am guessing this is situation
you encountered. Basically these two config knobs should always be
changed together - see the below cut-n-paste from the manual. Note the
default for JOB_IS_FINISHED_INTERVAL is 0, which is the same as not
defining it.... i.e. the default configuration works. I am curious
where/how you ended up with a setting of JOB_IS_FINISHED=10 without a
corresponding JOB_IS_FINISHED_COUNT setting. I checked with OSG and the
configuration they ship for the HTCondor-CE does not change either of
these knobs.
Hope this helps
Todd
From the Manual:
JOB_IS_FINISHED_COUNT
An integer value representing the number of jobs that the
condor_schedd will let permanently leave the job queue each time that it
examines the jobs that are ready to do so. The default value is 1.
JOB_IS_FINISHED_INTERVAL
The condor_schedd maintains a list of jobs that are ready to
permanently leave the job queue, for example, when they have completed
or been removed. This integer-valued macro specifies a delay in seconds
between instances of taking jobs permanently out of the queue. The
default value is 0, which tells the condor_schedd to not impose any delay.