[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Huge pile of jobs in "C" state



On Sat, Jan 17, 2015 at 12:41:52PM -0600, David Champion wrote:
> * On 09 Jan 2015, Steffen Grunewald wrote: 
> > On my pool - which is working flawlessly otherwise - I can see
> > a huge (>10000) number of jobs in C state.
> > >From what I can observe, those jobs had a rather short runtime -
> > there are only 1000+ slots available, and the number is growing
> > by hundreds every few minutes.
> > 
> > Apparently, some part of the job aftermath takes an unexpectedly
> > long time - but which? The number of shadows is rather small, and
> > the fileserver is behaving nicely (as iostat and ethstatus show).
> > TCP updates are enabled.
> 
> Do these jobs leave queue after some time has elapsed -- they are just
> slow -- or do they remain indefinitely?

They use(d) to leave the queue at a slower rate than new jobs were
entering C state. 

> I understood from a side conversation that you're using NFS, is that
> right?  I could be completely off target here, and I'm pretty new to
> condor, but a few questions come to mind wrt how the condor_shadow is
> delivering results:

I didn't see any additional shadow processes - at least not as many as 
the number of jobs in C would suggest. So I presume it's not a shadow issue.

Thanks, Steffen

-- 
Steffen Grunewald * Cluster Admin * steffen.grunewald(*)aei.mpg.de
MPI f. Gravitationsphysik (AEI) * Am Mühlenberg 1, D-14476 Potsdam
http://www.aei.mpg.de/ * ------- * +49-331-567-{fon:7274,fax:7298}