Re: [Gems-users] Ruby and Simics stall flag


Date: Tue, 27 Mar 2007 09:50:13 -0600
From: Dan Gibson <degibson@xxxxxxxx>
Subject: Re: [Gems-users] Ruby and Simics stall flag
You should vary it, really. You should observe a noticeable slowdown vs. no-ruby with even low values -- recall there is basically no execution within Ruby (you can verify this yourself, if you like).

Regards,
Dan

Thomas De Schampheleire wrote:
I did not touch PERFECT_MEMORY_SYSTEM_LATENCY, I'm not sure what the default is.
What should I put it to to get valid results?

Thomas

On 3/27/07, Dan Gibson <degibson@xxxxxxxx> wrote:
  
 What was PERFECT_MEMORY_SYSTEM_LATENCY? Simics gets slow when it is
stalled...

 Thomas De Schampheleire wrote:
 Hi,

I have done a simple (stalled) simulation of 750000 cycles,
cpu-switch-time 1, fixed random seed, 4 processors, both with as
without a perfect memory system.

The results are not consistent with what you said: with a perfect
memory system, the simulation takes only 5s, while with a normal
memory system (PERFECT_MEMORY_SYSTEM false) it takes 66s. This would
suggest that most of the slowdown does come from ruby.

To recall, the versions we are using is Simics 2.2.19 and Gems 1.2.
Could it be that your observations are with the newer Simics versions
(3.x) only?

The short stats follow below...

Thanks, Thomas


SHORT Profiler Stats (perfect memory)
--------------
Virtual_time_in_seconds: 5.62
Virtual_time_in_minutes: 0.0936667
Virtual_time_in_hours: 0.00156111
Virtual_time_in_days: 0.00156111

Ruby_current_time: 375000
Ruby_start_time: 1
Ruby_cycles: 374999

Total_misses: 0
total_misses: 0 [ 0 0 0 0 ]
user_misses: 0 [ 0 0 0 0 ]
supervisor_misses: 0 [ 0 0 0 0 ]

instruction_executed: 3000004 [ 750001 750001 750001 750001 ]
cycles_per_instruction: 0.499998 [ 0.499998 0.499998 0.499998 0.499998 ]
misses_per_thousand_instructions: 0 [ 0 0 0 0 ]

transactions_started: 0 [ 0 0 0 0 ]
transactions_ended: 0 [ 0 0 0 0 ]
instructions_per_transaction: 0 [ 0 0 0 0 ]
cycles_per_transaction: 0 [ 0 0 0 0 ]
misses_per_transaction: 0 [ 0 0 0 0 ]

L1D_cache cache stats:
 L1D_cache_total_misses: 0
 L1D_cache_total_demand_misses: 0
 L1D_cache_total_prefetches: 0
 L1D_cache_total_sw_prefetches: 0
 L1D_cache_total_hw_prefetches: 0
 L1D_cache_misses_per_transaction: 0
 L1D_cache_misses_per_instruction: 0
 L1D_cache_instructions_per_misses: NaN

 L1D_cache_request_size: [binsize: log2 max: 0 count: 0 average: NaN
|standard deviation: NaN | 0 ]

L1I_cache cache stats:
 L1I_cache_total_misses: 0
 L1I_cache_total_demand_misses: 0
 L1I_cache_total_prefetches: 0
 L1I_cache_total_sw_prefetches: 0
 L1I_cache_total_hw_prefetches: 0
 L1I_cache_misses_per_transaction: 0
 L1I_cache_misses_per_instruction: 0
 L1I_cache_instructions_per_misses: NaN

 L1I_cache_request_size: [binsize: log2 max: 0 count: 0 average: NaN
|standard deviation: NaN | 0 ]

L2_cache cache stats:
 L2_cache_total_misses: 0
 L2_cache_total_demand_misses: 0
 L2_cache_total_prefetches: 0
 L2_cache_total_sw_prefetches: 0
 L2_cache_total_hw_prefetches: 0
 L2_cache_misses_per_transaction: 0
 L2_cache_misses_per_instruction: 0
 L2_cache_instructions_per_misses: NaN

 L2_cache_request_size: [binsize: log2 max: 0 count: 0 average: NaN
|standard deviation: NaN | 0 ]



SHORT Profiler Stats (non-perfect memory)
--------------
Virtual_time_in_seconds: 66.97
Virtual_time_in_minutes: 1.11617
Virtual_time_in_hours: 0.0186028
Virtual_time_in_days: 0.0186028

Ruby_current_time: 375000
Ruby_start_time: 1
Ruby_cycles: 374999

Total_misses: 484
total_misses: 484 [ 355 43 43 43 ]
user_misses: 0 [ 0 0 0 0 ]
supervisor_misses: 484 [ 355 43 43 43 ]

instruction_executed: 2599462 [ 456314 714418 714451 714279 ]
cycles_per_instruction: 0.577041 [ 0.8218 0.524901 0.524877 0.525004 ]
misses_per_thousand_instructions: 0.186192 [ 0.777973
0.0601889
0.0601861 0.0602006 ]

transactions_started: 0 [ 0 0 0 0 ]
transactions_ended: 0 [ 0 0 0 0 ]
instructions_per_transaction: 0 [ 0 0 0 0 ]
cycles_per_transaction: 0 [ 0 0 0 0 ]
misses_per_transaction: 0 [ 0 0 0 0 ]

L1D_cache cache stats:
 L1D_cache_total_misses: 250
 L1D_cache_total_demand_misses: 250
 L1D_cache_total_prefetches: 0
 L1D_cache_total_sw_prefetches: 0
 L1D_cache_total_hw_prefetches: 0
 L1D_cache_misses_per_transaction: 250
 L1D_cache_misses_per_instruction: 9.61739e-05
 L1D_cache_instructions_per_misses: 10397.8

 L1D_cache_request_type_LD: 67.2%
 L1D_cache_request_type_ST: 32%
 L1D_cache_request_type_ATOMIC: 0.8%

 L1D_cache_access_mode_type_SupervisorMode: 250 100%
 L1D_cache_request_size: [binsize: log2 max: 8 count: 250 average:
5.62 | standard deviation: 2.85035 | 0 19 55 33 143 ]

L1I_cache cache stats:
 L1I_cache_total_misses: 234
 L1I_cache_total_demand_misses: 234
 L1I_cache_total_prefetches: 0
 L1I_cache_total_sw_prefetches: 0
 L1I_cache_total_hw_prefetches: 0
 L1I_cache_misses_per_transaction: 234
 L1I_cache_misses_per_instruction: 9.00187e-05
 L1I_cache_instructions_per_misses: 11108.8

 L1I_cache_request_type_IFETCH: 100%

 L1I_cache_access_mode_type_SupervisorMode: 234 100%
 L1I_cache_request_size: [binsize: log2 max: 4 count: 234 average:
 4 | standard deviation: 0 | 0 0 0 234 ]

L2_cache cache stats:
 L2_cache_total_misses: 484
 L2_cache_total_demand_misses: 484
 L2_cache_total_prefetches: 0
 L2_cache_total_sw_prefetches: 0
 L2_cache_total_hw_prefetches: 0
 L2_cache_misses_per_transaction: 484
 L2_cache_misses_per_instruction: 0.000186193
 L2_cache_instructions_per_misses: 5370.78

 L2_cache_request_type_LD: 34.7107%
 L2_cache_request_type_ST: 16.5289%
 L2_cache_request_type_ATOMIC: 0.413223%
 L2_cache_request_type_IFETCH: 48.3471%

 L2_cache_access_mode_type_SupervisorMode: 484 100%
 L2_cache_request_size: [binsize: log2 max: 8 count: 484 average:
4.83678 | standard deviation: 2.20154 | 0 19 55 267 143 ]


On 3/23/07, Dan Gibson <degibson@xxxxxxxx> wrote:


 Actually, the lack of instruction timing is just a corollary of the lack
of the stalling model. The "stalling" model is, as far as we can tell
without a lot of direct input from Virtutech, a significantly different
operating mode withing Simics. Obviously, Simics is still "stallable"
without it, but probably lacks some features... after all, we *are* hooking
into a codebase that we do not actually control.

 Thomas De Schampheleire wrote:
 So, the difference of about one hour and a half, would be solely due
to the instruction cache?

The conclusion stays the same, right: the results without -stall are
not correct or not useful.

Would compiling ruby with optimization flags have a lot of influence?
Which improvement factor can I approximately expect?

 Ruby should compile with architecture and generic optimizations in-place
already. We've done performance profiling, and a lot of the execution time
(90%+ in many cases) actually occurs within the Simics executable, not
within Ruby. We conclude that Simics behaves differently (slower certainly,
perhaps more conservatively?) when it interfaces with a stalling memory
timer. You can test this observation yourself with the PERFECT_MEMORY_SYSTEM
flags, which effectively turn Ruby into a near-zero-execution-time model.

 Regards,
 Dan


 Thanks, Thomas


 _______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
"site:https://lists.cs.wisc.edu/archive/gems-users/" to
your search.




_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
"site:https://lists.cs.wisc.edu/archive/gems-users/" to
your search.



    
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.


  
[← Prev in Thread] Current Thread [Next in Thread→]