Hello,
We're trying to assess the impact on overall performance, if certain
coherence messages are delayed by X cycles. For example, we aim to find
out the percentage degradation by delaying all GETS messages (say, by an
extra 4 cycles) that go across a link. Following are the changes we
made to the MOESI_CMP_directory code:
(i) In Throttle.C::wakeup() - We check if the coherence message is of a
certain target-type (here, GETS), then we enqueue it with
(m_link_latency + extra_delay(4)) as the delta
(ii) In MessageBuffer.c::enqueue() - we set m_randomization to false, in
order for this delta not to get ignored inside enqueue.
(iii) Delaying the messages might cause out-of-order arrival at the
buffers, so we had to set m_strict_fifo to false in both
MessageBuffer.C, SimpleNetwork.C and Switch.C
We evaluated the results by running entire benchmarks (assume a baseline
network_link_latency as 4). We are suspicious of our methodology because
*many* benchmarks complete in lesser ruby cycles even with additional
delays. For example, volrend completes in 21% and 14% lesser ruby_cycles
(than the baseline) when 4 extra cycles of delay are added to INV and
GETS messages respectively!!
Now, it would be a great help if you could let us know if we are missing
something in our methodology or are these results indeed to be believed?
Thanks a lot!
Vivek
Gregory T Byrd wrote:
It's not reasonable to expect the same number of instructions to be
executed. Different orderings of memory operations can result in
different execution paths. For a simple example, think of multiple
threads spinning on a lock -- the delay of memory operations can affect
who gets the lock first, and when they get the lock. So the number of
instructions executed in the spinning loops would be different.
Also, you would expect the number of cycles to be different. After all,
that's why you want to change the coherence protocol, I assume -- to make
things execute faster?
I think the only reasonable measure is to execute the same amount of
"work" (e.g., the entire benchmark), and show how long it took for that
work to be executed with the different schemes.
...Greg
On Tue, 1 May 2007, Vivek Venkatesan wrote:
Hello All,
I'm having trouble making an accurate comparison between two coherence
models both based on MOESI_CMP_directory with Ruby. I am using a 16-core
CMP and my initial approach was to run the same benchmark with the two
models for a specific number of instructions (c 50000000) and compare
the total ruby cycles elapsed. This didn't work because the 2 runs
resulted in a different number of total instructions executed.
Another method I tried was to have a magic instruction at the beginning
and end of the benchmark so that I can track the total ruby cycles taken
to complete the entire benchmark. Then again, the total instructions
executed differed (which did not make sense to me at all, especially
since both start from the same checkpoint).
So how would I go about making a fair comparison of the two models? Is
there any way you can make simics execute a certain number of total
instructions or make ruby execute a certain number of total cycles?
Thanks much.
-Vivek
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
|