Date: Tue, 20 Mar 2007 23:46:48 -0500
From: "Lei Yang" <lya755@xxxxxxxxxxxxxxxxxxxx>
Could it be that I only used one opal module for 4 processors? I basically just copied the procedure from GEMS quickstart, which gives an example of loading 1p solaris simulation. Below is my script:
read-configuration ../../checkpoints/bagle-4p-mcf-art.conf
instruction-fetch-mode instruction-fetch-trace
cpu-switch-time 1
load-module ruby
load-module opal
ruby0.setparam g_NUM_PROCESSORS 4
ruby0.setparam g_PROCS_PER_CHIP 1
opal0.sim-start "results.opal"
opal0.sim-step 100000000
ruby0.dump-stats filename = results.ruby
I meant, should I also be initializing opal1, opal2, opal3? Please comment.
Thanks in advance!
On 3/20/07, Lei Yang <lya755@xxxxxxxxxxxxxxxxxxxx> wrote:
Thanks Liqun. I see. Indeed Ruby cycles is half of the Opal cycles. However, I found in results.opal similar stats for [0][1][2][3], are they meant for each processor (since in my system I simulated four CPUs)? 


But as can be seen from below, how come [1] and [2] have more total number of instructions than what I've specified by "C 100000000" ?
I don't know. Timing-first simulator (TFsim/opal) will reexecuate some instructions if the results are different from the functional simulator. But 150% more instructions are way too many. Lei, please wait for Mike or Luke's reply.


Is there a documentation of how to read the opal and ruby dump stats?
Thanks a lot!
[1] *** Runtime statistics:
[1]   Total number of instructions                        255117523
[1]   Total number of cycles                               64421882
[1]   number of continue calls                            255117523
[1]   Instruction per cycle:                             3.96011
[1]   Total Elapsed Time:                                36060 sec 0 usec
[1]   Total Retirement Time:                             3740 sec 262608 usec
[1]   Approximate cycle per sec:                         1786.52
[1]   Approximate instructions per sec:                  7074.79
[1]   This processor's Simics overhead (retire/elapsed):  10.37%
[1]   Average number of instructions per continue          1.00
[2] *** Runtime statistics:
[2]   Total number of instructions                        256142056
[2]   Total number of cycles                               64421882
[2]   number of continue calls                            256142056
[2]   Instruction per cycle:                             3.97601
[2]   Total Elapsed Time:                                36060 sec 0 usec
[2]   Total Retirement Time:                             3713 sec 458821 usec
[2]   Approximate cycle per sec:                         1786.52
[2]   Approximate instructions per sec:                  7103.2
[2]   This processor's Simics overhead (retire/elapsed):  10.30%
[2]   Average number of instructions per continue          1.00
[3] *** Runtime statistics:
[3]   Total number of instructions                         40012500
[3]   Total number of cycles                               64421882
[3]   number of continue calls                             40012500
[3]   Instruction per cycle:                             0.621101
[3]   Total Elapsed Time:                                36060 sec 0 usec
[3]   Total Retirement Time:                             600 sec 825168 usec
[3]   Approximate cycle per sec:                         1786.52
[3]   Approximate instructions per sec:                  1109.61
[3]   This processor's Simics overhead (retire/elapsed):   1.67%
[3]   Average number of instructions per continue          1.0
[0] *** Runtime statistics:
[0]   Total number of instructions                        100000003

This is the number of instructions  graduated,  specified by "C 100000000"

[0]   Total number of cycles                               64421882

This is the opal cycles, if OPAL_RUBY_MULTIPLIER is 1, then this number should be equal to ruby cycles. But by default, OPAL_RUBY_MULTIPLIER is 2, so opal cycles should be twice of ruby cycles.

hope this helps.

[0]   number of continue calls                            100000003
[0]   Instruction per cycle:                             1.55227
[0]   Total Elapsed Time:                                36060 sec 0 usec
[0]   Total Retirement Time:                             1522 sec 670804 usec
[0]   Approximate cycle per sec:                         1786.52
[0]   Approximate instructions per sec:                  2773.15
[0]   This processor's Simics overhead (retire/elapsed):   4.22%
[0]   Average number of instructions per continue          1.00
Just my 2 cents.

1. The simulation is very very slow and it seems impossible to run the entire benchmark. 10 million cycles cost me more than one hour. Although I can specify warm up length, it is best to cover the entire life span of the benchmark. Has anyone tried to use a sampling approach? I guess it's OK to wait on the completion of the entire benchmark when producing final performance numbers, but it certainly is a pain whenever there is modification to the code and we want to see how it affects the performance. GEMS users, how do you handle this problem?

Most studies use Opal only in the sensitivity analysis, say run for 100M instructions. You might consider use the techniques in SMARTS. I vaguely remember CMU folks have released this in SimFlex.

2. Exactly what performance number should I look at to compare two systems, when both Opal and Ruby are used. I saw on FAQ that one should use Ruby_cycles  to measure the runtime of the simulated system. But when I let Opal run the same number of cycles, shouldn't Ruby_cycles be the same for both? If not, why?

Opal shows  how  many  instructions graduated, not cycles.


I appreciate your comments!

