Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] LoadAvg calculation/bug?
- Date: Fri, 11 Jun 2010 09:17:40 -0500
- From: David Kotz <dkotz@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] LoadAvg calculation/bug?
I'm not seeing the same results on my systems. I found a four-core node
with three slots claimed and busy and some sort of non-Condor load on
it:
carrion $ condor_status -l nauro-10 | grep -i loadavg | grep -vi start |
grep -vi busy
CondorLoadAvg = 1.000000
LoadAvg = 1.000000
TotalLoadAvg = 4.000000
TotalCondorLoadAvg = 2.990000
CondorLoadAvg = 0.000000
LoadAvg = 1.000000
TotalLoadAvg = 4.000000
TotalCondorLoadAvg = 2.990000
CondorLoadAvg = 1.000000
LoadAvg = 1.000000
TotalLoadAvg = 4.000000
TotalCondorLoadAvg = 2.990000
CondorLoadAvg = 1.000000
LoadAvg = 1.000000
TotalLoadAvg = 4.000000
TotalCondorLoadAvg = 2.990000
This machine is running 32bit Ubuntu with the RHEL5 Condor 7.4.2
tarball.
- dave
On Thu, 2010-06-10 at 16:05 -0700, kristian kvilekval wrote:
> Thanks for the pointer, but TotalLoadAvg still seems to be based on
> Condor Jobs only. What we are look to do is not start jobs on nodes
> that have any recent load or jobs on them.. As not all people will use
> condor to start their jobs we need to use the actual kernel load
> average. Note below that Condor is reporting no load for b0003, but
> really there is a load average of 2..
>
>
> $ condor_status -l b0003| fgrep TotalLoad
> TotalLoadAvg = 0.0
> TotalLoadAvg = 0.0
> TotalLoadAvg = 0.0
> TotalLoadAvg = 0.0
> TotalLoadAvg = 0.0
> (bqenv)bqphytomorph@claw$ ssh b0003
> bqphytomorph@b0003$ w
> 23:02:40 up 78 days, 43 min, 2 users, load average: 2.17, 1.84, 1.76
> USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
> diana pts/1 nail00 Wed22 1:18m 1:52m 1:52m /cluster/home/matlab_2009a_x86_64/bin/glnxa64/
> bqphytom pts/3 nail99 23:02 0.00s 0.00s 0.00s w
>
>
>
> On Thu, 2010-06-10 at 17:23 -0500, David Kotz wrote:
> > Rather than LoadAvg, I think you should use target.TotalLoadAvg.
> > LoadAvg refers to the load average of a single slot on a multicore
> > machine, and without specifying target.TotalLoadAvg, the expression
> > might (I'm guessing) actually look at the load average of a slot on the
> > submit machine. Condor has some disambiguation built in, but I like to
> > specify, just in case.
> >
> > - dave
> >
> >
> >
> > On Thu, 2010-06-10 at 14:46 -0700, kgk wrote:
> > > Condor: 7.5.2
> > > Debian Linux distribution AMD64
> > > Nodes: 64
> > >
> > > We have shared cluster where users may log in and start jobs manuall.
> > > We would prefer
> > > that nodes/slots with a high local load average be avoided for condor
> > > jobs.
> > > We have added Rank = (100 - LoadAvg) to our standard submit scripts.
> > > However, using condor_status I see many nodes (already being used by
> > > others)
> > > show a LoadAvg of 0.0 meaning they are scheduled with equal rank.
> > >
> > > In some condor documents it seems that LoadAvg is determined by the
> > > submitted condor jobs and in others it seems to be the true machine
> > > load reported by the OS.
> > >
> > > 1. Is LoadAvg supposed to be the kernel reported load average?
> > > 2. If so, then I believe there is a bug
> > > 3. If not then how should I select for machine with no or very low
> > > load?
> > >
> > > Thanks,
> > > Kris
> > > _______________________________________________
> > > Condor-users mailing list
> > > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > > subject: Unsubscribe
> > > You can also unsubscribe by visiting
> > > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > >
> > > The archives can be found at:
> > > https://lists.cs.wisc.edu/archive/condor-users/
> >
> >
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/