Mailing List Archives
	Authenticated access
	
	
     | 
    
	 
	 
     | 
    
	
	 
     | 
  
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] A lot of jobs in C state
- Date: Tue, 05 Sep 2017 17:04:26 -0500
 
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
 
- Subject: Re: [HTCondor-users] A lot of jobs in C state
 
On 9/4/2017 11:06 AM, Carles Acosta wrote:
Hello again,
It seems that the issue was related withÂJOB_IS_FINISHED_INTERVAL. It 
was set at 10 seconds, but the jobs stayed for much longer as I've 
commented in my previous email. Removing JOB_IS_FINISHED_INTERVAL from 
the Schedd config, all seems to work correctly again. There are no more 
"actOnJobs: didn't do any work, aborting" messages in the Schedd for the 
last 7 hours.
I don't know if I misunderstand JOB_IS_FINISHED_INTERVAL macro. We're 
running HTCondor 8.6.5 and there were no issues related with 
JOB_IS_FINISHED_INTERVAL with previous versions, such as the 8.5.8, as 
far as we know.
Thank you very much.
Cheers,
Carles
Hi Carles,
Thanks for the follow-up post above.
You stated you had problems when your config had
   JOB_IS_FINISHED_INTERVAL=10
What was JOB_IS_FINISHED_COUNT set to be?
If you change JOB_IS_FINISHED_INTERVAL to be 10, and don't also set 
JOB_IS_FINISHED_COUNT, the result is the schedd will only allow one job 
to leave the queue every 10 seconds!!  I am guessing this is situation 
you encountered. Basically these two config knobs should always be 
changed together - see the below cut-n-paste from the manual.  Note the 
default for JOB_IS_FINISHED_INTERVAL is 0, which is the same as not 
defining it.... i.e. the default configuration works.  I am curious 
where/how you ended up with a setting of JOB_IS_FINISHED=10 without a 
corresponding JOB_IS_FINISHED_COUNT setting. I checked with OSG and the 
configuration they ship for the HTCondor-CE  does not change either of 
these knobs.
Hope this helps
Todd
From the Manual:
JOB_IS_FINISHED_COUNT
    An integer value representing the number of jobs that the 
condor_schedd will let permanently leave the job queue each time that it 
examines the jobs that are ready to do so. The default value is 1.
JOB_IS_FINISHED_INTERVAL
    The condor_schedd maintains a list of jobs that are ready to 
permanently leave the job queue, for example, when they have completed 
or been removed. This integer-valued macro specifies a delay in seconds 
between instances of taking jobs permanently out of the queue. The 
default value is 0, which tells the condor_schedd to not impose any delay.