Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] PASSWD_CACHE_REFRESH in 6.9.4?
- Date: Tue, 23 Oct 2007 13:36:04 -0700
- From: Stuart Anderson <anderson@xxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] PASSWD_CACHE_REFRESH in 6.9.4?
Dan,
Thanks for the explanation. What is the current default and
recommended value for SCHED_UNIV_RENICE_INCREMENT in the 6.9 series?
What about adding a LOCAL_UNIV_RENICE_INCREMENT option? For example,
I would like to make the distinction that DAGMan has a higher priority
in the scheduler universe than short running user jobs in the local
universe.
Thanks.
On Tue, Oct 23, 2007 at 09:04:29AM -0500, Dan Bradley wrote:
>
>
> Ian Chesal wrote:
> >> This seems counter intuitive to me. Why would _not_ nice'ing
> >> the shadow
> >> processes on a busy submit machine be a good thing?
> >>
> >
> > Ditto. Is this a Windows scheduler only thing? I'm almost certain Alan
> > De Smet's talk every year at Condor Week talks about using higher nice
> > levels on the shadows to help out a starved-for-CPU schedd process.
> >
>
> If you want to increase the priority of the schedd, that is possibly a
> good idea. However, using SHADOW_RENICE_INCREMENT=10 to decrease the
> priority of the shadows below all other normal processes on the system
> degrades throughput in every case we have observed or tested in the 6.9
> branch. Part of the problem is that the schedd and the shadow need to
> communicate. During this communication, it is actually possible for the
> schedd to be slowed down because it is stuck waiting for a response from
> a low priority shadow. More common is to see connection failures in the
> shadow logs due to the shadow being so cpu starved that it cannot form a
> connection to the schedd, even with very generous timeouts.
>
> Another thing that has changed is that the 6.9.4 schedd is much less cpu
> hungry than 6.8. Having 10s of thousands of jobs in the queue and a few
> thousand jobs running should not severely tax the 6.9.4 schedd on
> reasonable server-class hardware unless the jobs are so fast that the
> completion rate is greater than ~10-15 jobs per second.
>
> I'll admit that our tests of this have all been under linux and have
> been focussed on vanilla universe. We're certainly hoping for feedback
> on all the other possible usage cases.
>
> Cheers,
> --Dan
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
--
Stuart Anderson anderson@xxxxxxxxxxxxxxxx
http://www.ligo.caltech.edu/~anderson