Re: [HTCondor-devel] CPI2: CPU performance isolation for shared compute clusters

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Date:	Sat, 13 Apr 2013 11:26:05 -0700
From:	Igor Sfiligoi <sfiligoi@xxxxxxxx>
Subject:	Re: [HTCondor-devel] CPI2: CPU performance isolation for shared compute clusters

But I guess this is not available to the normal users, right?
(i.e. the glidein use case)

Igor

On 04/13/2013 10:16 AM, Brian Bockelman wrote:

Hi Matt,

I had never looked into this before, but I found (in the RHEL6 manual of all things!) that there is a "perf_event" cgroup controller.

This would allow us to, amongst other things, record CPI for all HTCondor jobs and report them in the resulting classads.

For example, from a running job on our cluster:

[root@node110 ~]# mkdir  /cgroup/perf_event/foo
[root@node110 ~]# echo 13020 > /cgroup/perf_event/foo/tasks
[root@node110 ~]# sudo perf stat -a -e task-clock,cpu-cycles,branches,branch-misses,instructions,cs,faults,migrations,stalled-cycles-frontend,stalled-cycles-backend -G /foo,/foo,/foo,/foo,/foo,/foo,/foo,/foo,/foo,/foo sleep 5

  Performance counter stats for 'sleep 5':

        4956.990670 task-clock                /foo #    0.991 CPUs utilized           [99.98%]
     11,654,810,547 cpu-cycles                /foo #    2.351 GHz                     [83.29%]
        959,551,261 branches                  /foo #  193.575 M/sec                   [83.34%]
         24,915,394 branch-misses             /foo #    2.60% of all branches         [66.62%]
     11,423,755,623 instructions              /foo #    0.98  insns per cycle
                                              #    0.68  stalled cycles per insn [83.30%]
                120 cs                        /foo #    0.024 K/sec                   [99.99%]
                  0 faults                    /foo #    0.000 K/sec                   [99.99%]
                  0 migrations                /foo #    0.000 K/sec                   [99.99%]
      7,720,637,500 stalled-cycles-frontend   /foo #   66.24% frontend cycles idle    [83.33%]
        925,166,949 stalled-cycles-backend    /foo #    7.94% backend  cycles idle    [83.35%]

        5.001046504 seconds time elapsed

(there's a machine-readable version of the output if you add "-x ,")

Performance counters would be FANTASTIC to have as users typically have no clue about this data.

Brian

On Apr 12, 2013, at 6:23 AM, Matthew Farrellee <matt@xxxxxxxxxx> wrote:

http://research.google.com/pubs/pub40737.html

Interesting approach using cycles-per-instruction as a health metric
_______________________________________________
HTCondor-devel mailing list
HTCondor-devel@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel




_______________________________________________
HTCondor-devel mailing list
HTCondor-devel@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel

[← Prev in Thread]	Current Thread	[Next in Thread→]
[HTCondor-devel] CPI2: CPU performance isolation for shared compute clusters, Matthew Farrellee Re: [HTCondor-devel] CPI2: CPU performance isolation for shared compute clusters, Brian Bockelman Re: [HTCondor-devel] CPI2: CPU performance isolation for shared compute clusters, Igor Sfiligoi <= Re: [HTCondor-devel] CPI2: CPU performance isolation for shared compute clusters, Brian Bockelman

Previous by Date:	Re: [HTCondor-devel] CPI2: CPU performance isolation for shared compute clusters, Brian Bockelman
Next by Date:	Re: [HTCondor-devel] CPI2: CPU performance isolation for shared compute clusters, Brian Bockelman
Previous by Thread:	Re: [HTCondor-devel] CPI2: CPU performance isolation for shared compute clusters, Brian Bockelman
Next by Thread:	Re: [HTCondor-devel] CPI2: CPU performance isolation for shared compute clusters, Brian Bockelman
Indexes:	[Date] [Thread]

Mailing List Archives

Authenticated access

Re: [HTCondor-devel] CPI2: CPU performance isolation for shared compute clusters