[Gems-users] Coherence Message Latencies


Date: Wed, 09 May 2007 20:05:02 -0600
From: Vivek Venkatesan <vvenkate@xxxxxxxxxxx>
Subject: [Gems-users] Coherence Message Latencies
Hello,

We're trying to assess the impact on overall performance, if certain coherence messages are delayed by X cycles. For example, we aim to find out the percentage degradation by delaying all GETS messages (say, by an extra 4 cycles) that go across a link. Following are the changes we made to the MOESI_CMP_directory code:

(i) In Throttle.C::wakeup() - We check if the coherence message is of a certain target-type (here, GETS), then we enqueue it with (m_link_latency + extra_delay(4)) as the delta (ii) In MessageBuffer.c::enqueue() - we set m_randomization to false, in order for this delta not to get ignored inside enqueue. (iii) Delaying the messages might cause out-of-order arrival at the buffers, so we had to set m_strict_fifo to false in both MessageBuffer.C, SimpleNetwork.C and Switch.C

We evaluated the results by running entire benchmarks (assume a baseline network_link_latency as 4). We are suspicious of our methodology because *many* benchmarks complete in lesser ruby cycles even with additional delays. For example, volrend completes in 21% and 14% lesser ruby_cycles (than the baseline) when 4 extra cycles of delay are added to INV and GETS messages respectively!!

Now, it would be a great help if you could let us know if we are missing something in our methodology or are these results indeed to be believed?

Thanks a lot!
Vivek



Gregory T Byrd wrote:

It's not reasonable to expect the same number of instructions to be executed. Different orderings of memory operations can result in different execution paths. For a simple example, think of multiple threads spinning on a lock -- the delay of memory operations can affect who gets the lock first, and when they get the lock. So the number of instructions executed in the spinning loops would be different.

Also, you would expect the number of cycles to be different. After all, that's why you want to change the coherence protocol, I assume -- to make things execute faster?

I think the only reasonable measure is to execute the same amount of "work" (e.g., the entire benchmark), and show how long it took for that work to be executed with the different schemes.

...Greg



On Tue, 1 May 2007, Vivek Venkatesan wrote:

Hello All,

I'm having trouble making an  accurate comparison between two coherence
models both based on MOESI_CMP_directory with Ruby. I am using a 16-core
CMP and my initial approach was to run the same benchmark with the two
models for a specific number of instructions (c 50000000) and compare
the total ruby cycles elapsed. This didn't work because the 2 runs
resulted in a different number of total instructions executed.

Another method I tried was to have a magic instruction at the beginning
and end of the benchmark so that I can track the total ruby cycles taken
to complete the entire benchmark. Then again, the total instructions
executed differed (which did not make sense to me at all, especially
since both start from the same checkpoint).

So how would I go about making a fair comparison of the two models? Is
there any way you can make simics execute a certain number of total
instructions or make ruby execute a certain number of total cycles?

Thanks much.

-Vivek
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.


_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.


[← Prev in Thread] Current Thread [Next in Thread→]