Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Very slow response from condor_q andcondor_status
- Date: Tue, 6 Dec 2005 10:37:14 -0800
- From: "Little, Colin E" <ColinLittle@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Very slow response from condor_q andcondor_status
Thanks for the suggestion; we're going to give it a shot and see what
happens. Our plan is to move all three directories (log, spool, execute)
for each host from the NFS mounted /home/condor/hosts directory to local
directories. Is there any reason why this could be a bad idea? Having
everything in /home/condor is nice and organized, but it seems like we
should be okay moving stuff around.
Thanks again!
-Colin
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Erik Paulson
Sent: Saturday, December 03, 2005 12:38 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Very slow response from condor_q
andcondor_status
On Sat, Dec 03, 2005 at 08:32:07PM -0000, Chris Miles wrote:
> I have the same issue and have put it down to the mounted home
directory, as when I occasionaly
> get the slow response from condor_status If I try to copy a file from
/home/condor to the local drive
> its terribly unresponsive and slow also. Never bothered me enough to
find a solution though.
>
Problems with NFS file locking, maybe?
Try putting your log files on a local disk - I could believe that
the problem is Condor writing to a logfile that's on NFS, and either
the lock takes a long time to accquire, or the write itself takes a long
time to finish. Both of those would cause a daemon to just freeze up.
-Erik
> Chris
> ----- Original Message -----
> From: Little, Colin E
> To: condor-users@xxxxxxxxxxx
> Sent: Friday, December 02, 2005 6:20 PM
> Subject: [Condor-users] Very slow response from condor_q and
condor_status
>
>
> I'm setting up a condor pool which is currently just barely up and
running. We have 1 Central Master server, 1 Submit only machine and 2
execute/submit machines. All are running Redhat Enterprise Linux.
Occasionally we'll find that condor_q and condor_status will hang for
long periods of time (1-2 minutes or more) before responding. I haven't
been able to reliably reproduce it, so I'm hoping that others may have
seen something similar.
>
>
>
> Things we've thought of:
>
>
>
> It seems that it hangs at times when a job is sitting in the "2
Servers match, match, but reject the job for unknown reasons" stage,
which I believe is waiting for the Negotiator. We've lowered the
NEGOTIATOR_INTERVAL to 30, which seemed to help a bit, but might have
just been wishful thinking.
>
>
>
> We raised the number of vm's per execute/submit machine from 1 to 2,
which seems to have improved it, but that could easily be coincidence.
>
>
>
> Condor's main directory is in /home/condor, which is an NFS mounted
partition. We suspect that NFS might be hiccupping, preventing the
collector from retrieving the status info.
>
>
>
>
>
>
>
> At this point we don't know if the problem is with the machines, the
network, condor or any of a hundred other things, but any insight into
this problem would be very helpful.
>
>
>
> Thanks a lot.
>
> -Colin Little
>
>
>
>
>
>
------------------------------------------------------------------------
------
>
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users