Those results do seem unusual. The 1MB vs 2MB data seems to make sense, 
though the 512KB L2 size is quite strange. What are the working-set 
sizes for your applications? Very large or very small working sets can 
be less sensitive to cache size. It could be that the working sets 
overwhelm the L2 regardless of the three configurations below...try 
using a very large (~2x working set size) L2 cache.
As for the instruction count, consider this:
 In multithreaded applications, there is some interthread interaction, 
through data sharing, cooperative caching, and synchronization. Changing 
the caches changes the interactions here...suppose a processor is 
spinning, waiting for a lock to release. The length of the spin (and the 
number of instructions executed as a result of the spin) is influenced 
by _other_ processor's cache performance (especially the thread holding 
the lock!).
I was initially confused by this behavior as well...it is subtle.
Regards,
Dan
thethem wrote:
 
Thanks for responding so quickly, Dan!  Let me clarify...
> Actually, the count of Ruby_cycles is the performance metric in Ruby.
> It is true that they are proportional to the simulation time, but
> simulation time (host execution time) is proportional to simulated
> time (target execution time).
 If this is true then some of my results are confusing.  For example, all 
other being equal, a larger L2 cache results in a longer execution time 
even with fewer cache misses.  This seems backwards to me.  I'll list 
some parameters and results below.  The only item that was changed from 
experiment to experiment was the L2 size:
CMP 1x8 (one chip 8 procs per chip)
2MB L2
     Ruby_cycles: 2388071860
     instruction_executed: 22174042603
     cycles_per_instruction: 0.861574
CMP 1x8 (one chip 8 procs per chip)
1MB L2
     Ruby_cycles: 2638412275
     instruction_executed: 24980575450
     cycles_per_instruction: 0.844948
CMP 1x8 (one chip 8 procs per chip)
512kB L2
     Ruby_cycles: 2470006953
     instruction_executed: 23467006962
     cycles_per_instruction: 0.842036
I'm confused about the difference in instruction count.  Since they are 
executing the same number of instructions, according to Simics, 
shouldn't the instructions that Ruby sees be approximately the same from 
experiment to experiment?
 As for Ruby cycles, I would expect it to decrease for larger cache sizes 
(up to a point).  The miss rate that Ruby is reporting makes sense, but 
I can't seem to figure what's happening with the simulation time.  The 
results above are from the Splash-2 benchmark, Ocean.  I have results 
from dbench2 which are similar.
Thanks for your time, Dan.
~Clay
Dan Gibson wrote:
  
Hello, Clay.
Let me answer your questions individually, below.
At 08:37 AM 11/30/2005 -0500, you wrote:
    
Hello everyone,
I've got several questions about the output from the Ruby module so I'll
just list them below.
-Is the L1 instruction cache assumed to be perfect?
      
 If by "perfect" you mean zero-cycle latency, then yes. This is the default 
setting for Ruby. However, this can be turned off by editing the 
ruby/config/rubyconfig.defaults and setting 
REMOVE_SINGLE_CYCLE_DCACHE_FAST_PATH to true, and then selecting L1 
parameters in the same file to suit your needs.
The L1 is *not* infinite in size.
    
-Is there a performance metric in the Ruby output?  One can't use number
of cycles or CPI when comparing different cache implementations because
the number of instructions is different and the ruby cycle time appears
to be a function of the simulation time.
      
 Actually, the count of Ruby_cycles is the performance metric in Ruby. It is 
true that they are proportional to the simulation time, but simulation time 
(host execution time) is proportional to simulated time (target execution 
time).
    
-Why is the instruction count different if the simulation starts from
the same checkpoint and terminates at the same flag?  It is, of course,
the same if you run the same model repeatedly.  However, if the cache
size is changed from 1MB L2 to, say, 8MB L2 then the instruction count
changes.
      
 Are you reporting a value from Simics or from Ruby? Ruby_cycles is the 
measure of simulated time required, so it will definitely change with 
changing cache parameters. If you're talking about Ruby_cycles, that is the 
explanation.
 Also, are you simulating simulations multiprocessor or single processor 
systems?
    
Thanks in advance,
Clay
      
 
Regards,
Dan
    
 
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
  
 
 |