We are running the pthread implementation of ocean using a 265x256 grid
with 50 steps using 4 threads. We are using the
MESI_CMP_filter_directory protocol. All the simulations ran for about 5
minutes. Below are some of our results:
L1 Size L2 Size L1 Misses L2 Misses
16kb 1mb 901324 902354
4kb 1mb 985237 992108
4kb 256kb 1036694 1059901
4kb 512kb 989132 996175
4kb 8mb 1008657 1015437
We also ran a test program that did nothing but sleep for 1 millisecond
(which took about 5 minutes wall time), and got 989132 L1 and 996175 L2
misses.
Why did increasing the L2 cache size to 8mb increase the number of misses?
Why do we have such a large overhead when we are not doing anything. If
these are all from the operating system, is it possible to ignore them
and profile just the benchmark program?
Finally, why are the number of L1 and L2 misses fairly similar, even
when the caches sizes are vastly different?
If anyone has any ideas, it would be greatly appreciated.
Mark
|