[Gems-users] gems suitability


Date: Sun, 16 Nov 2008 10:31:29 -0600 (CST)
From: Naveen Parihar <np1@xxxxxxxxxxxxxxx>
Subject: [Gems-users] gems suitability
Dear Users of GEMS,

I have a signal processing and computer science background and am new to hardware simulations. I want to know the opinion of GEMS users if GEMS might be a good fit for my simulations. A short problem statement is provided in the following paragraphs.
In specific, I have a very large speech recognition system (more than 50 
thousand words) and I have a serial and its parallel version (using 
pthreads) on CMP. When the serial code is not optimized, the parallel code 
shows runtime improvement. However, when the serial code is optimized (for 
cache performance), the parallel version of the optimized system shows a 
performance drop (on a dual core AMD Opetron). Preliminary investigations 
point to cache-coherency and memory bandwidth. The memory footprint of the 
application is around 1 GB and takes around 2-3 minutes to run a test 
case.
Now, given the problem, the question that I'm trying to find an answer to 
is what cache-architecture, interconnects, cache-coherency protocol would 
make the parallel system run faster than the serial system? After doing 
some preliminary research/reading on GEMS, I find that I can use Ruby + 
Simics + Opal for simulating a dual-core UltraSparc iii. Hence, I can 
benchmark my system baseline performance (runtime + cache misses, etc.) on 
this hardware configuration. Next, performance of parallel system can be 
benchmarked. I would expect the performance to drop in a similar fashion 
if not exactly the same to one observed on AMD Dual-core Opetron. Next, 
changes can be made to cache design, etc. and benchmarked.
Does this plan sound reasonable?

Cheers,
Naveen Parihar
Ph.D. Student (www.ece.msstate.edu/~np1)
[← Prev in Thread] Current Thread [Next in Thread→]