Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Huge pile of jobs in "C" state
- Date: Mon, 19 Jan 2015 09:16:20 +0100
- From: Steffen Grunewald <Steffen.Grunewald@xxxxxxxxxx>
- Subject: Re: [HTCondor-users] Huge pile of jobs in "C" state
On Sat, Jan 17, 2015 at 12:41:52PM -0600, David Champion wrote:
> * On 09 Jan 2015, Steffen Grunewald wrote:
> > On my pool - which is working flawlessly otherwise - I can see
> > a huge (>10000) number of jobs in C state.
> > >From what I can observe, those jobs had a rather short runtime -
> > there are only 1000+ slots available, and the number is growing
> > by hundreds every few minutes.
> >
> > Apparently, some part of the job aftermath takes an unexpectedly
> > long time - but which? The number of shadows is rather small, and
> > the fileserver is behaving nicely (as iostat and ethstatus show).
> > TCP updates are enabled.
>
> Do these jobs leave queue after some time has elapsed -- they are just
> slow -- or do they remain indefinitely?
They use(d) to leave the queue at a slower rate than new jobs were
entering C state.
> I understood from a side conversation that you're using NFS, is that
> right? I could be completely off target here, and I'm pretty new to
> condor, but a few questions come to mind wrt how the condor_shadow is
> delivering results:
I didn't see any additional shadow processes - at least not as many as
the number of jobs in C would suggest. So I presume it's not a shadow issue.
Thanks, Steffen
--
Steffen Grunewald * Cluster Admin * steffen.grunewald(*)aei.mpg.de
MPI f. Gravitationsphysik (AEI) * Am Mühlenberg 1, D-14476 Potsdam
http://www.aei.mpg.de/ * ------- * +49-331-567-{fon:7274,fax:7298}