Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] CPU accounting: NonCondorLoadAvg
- Date: Mon, 03 Jun 2013 17:17:55 +0100
- From: Brian Candler <b.candler@xxxxxxxxx>
- Subject: Re: [HTCondor-users] CPU accounting: NonCondorLoadAvg
And here's another observation. On some slots - but not all - LoadAvg is
exactly 1 larger than CondorLoadAvg.
Showing columns in the following order:
* TotalLoadAvg
* TotalCondorLoadAvg
* LoadAvg
* CondorLoadAvg
I see the following:
$ condor_status -format %17.17s Name -format " %-9.9s" State -format "
%-8.8s" Activity -format " %4d" Cpus -format " %6.3f" TotalLoadAvg
-format " %6.3f" TotalCondorLoadAvg -format " %6.3f" LoadAvg -format "
%6.3f\n" CondorLoadAvg | grep dar3
slot1@xxxxxxxxxxx Owner Idle 18 13.700 9.770 1.000 0.000
slot1_11@xxxxxxxx Claimed Busy 1 13.700 9.770 0.650 0.650
slot1_12@xxxxxxxx Claimed Busy 1 13.700 9.770 0.650 0.650
slot1_13@xxxxxxxx Claimed Busy 1 13.700 9.770 0.670 0.670
slot1_14@xxxxxxxx Claimed Busy 1 13.700 9.770 1.630 0.700
slot1_15@xxxxxxxx Claimed Busy 1 13.700 9.770 1.720 0.720
slot1_16@xxxxxxxx Claimed Busy 1 13.700 9.770 1.770 0.770
slot1_1@xxxxxxxxx Claimed Busy 1 13.700 9.770 0.710 0.710
slot1_2@xxxxxxxxx Claimed Busy 1 13.700 9.770 0.710 0.710
slot1_3@xxxxxxxxx Claimed Busy 1 13.700 9.770 0.760 0.760
slot1_4@xxxxxxxxx Claimed Busy 1 13.700 9.770 0.710 0.710
slot1_5@xxxxxxxxx Claimed Busy 1 13.700 9.770 0.690 0.690
slot1_6@xxxxxxxxx Claimed Busy 1 13.700 9.770 0.710 0.710
slot1_7@xxxxxxxxx Claimed Busy 1 13.700 9.770 0.660 0.660
slot1_8@xxxxxxxxx Claimed Busy 1 13.700 9.770 0.670 0.670
$ ssh dar3 uptime
17:04:52 up 19 days, 22:48, 0 users, load average: 13.79, 13.82, 14.05
$ condor_status -format %17.17s Name -format " %-9.9s" State -format "
%-8.8s" Activity -format " %4d" Cpus -format " %6.3f" TotalLoadAvg
-format " %6.3f" TotalCondorLoadAvg -format " %6.3f" LoadAvg -format "
%6.3f\n" CondorLoadAvg | grep dar4
slot1@xxxxxxxxxxx Owner Idle 27 5.300 1.000 1.000 0.000
slot1_1@xxxxxxxxx Claimed Busy 1 5.300 1.000 0.200 0.200
slot1_2@xxxxxxxxx Claimed Busy 1 5.300 1.000 0.500 0.200
slot1_4@xxxxxxxxx Claimed Busy 1 5.300 1.000 1.200 0.200
slot1_7@xxxxxxxxx Claimed Busy 1 5.300 1.000 1.200 0.200
slot1_8@xxxxxxxxx Claimed Busy 1 5.300 1.000 1.200 0.200
$ ssh dar4 uptime
17:04:38 up 19 days, 22:40, 0 users, load average: 5.33, 5.22, 5.22
It looks like TotalLoadAvg is the sum of LoadAvg (5.300 =
1.000+0.200+0.500+1.200+1.200+1.200). Note that this includes the first
1.000 which is the LoadAvg of 1.000 in an idle slot!
Also, TotalCondorLoadAvg is the sum of CondorLoadAvg (1.000 =
0.200+0.200+0.200+0.200+0.200)
I found some code in src/condor_startd.V6/ResMgr.cpp which appears to
spread the "owner load" over the slots, 1.0 per slot, which I think
explains that. And "owner load" is m_attr->load() - m_attr->condor_load()
Unfortunately, with I/O-waiting applications, the sum of CPU utilisation
of the processes is not directly comparable to the /proc/loadavg values,
i.e. the difference isn't going to give the "owner" load as far as I can
see.
Regards,
Brian.