Could it be that I only used one opal module for 4 processors? I basically just copied the procedure from GEMS quickstart, which gives an example of loading 1p solaris simulation. Below is my script:
read-configuration ../../checkpoints/bagle-4p-mcf-art.conf instruction-fetch-mode instruction-fetch-trace istc-disable dstc-disable cpu-switch-time 1 load-module ruby load-module opal ruby0.setparam
g_NUM_PROCESSORS 4 ruby0.setparam g_PROCS_PER_CHIP 1 ruby0.init opal0.init opal0.sim-start "results.opal" opal0.sim-step 100000000 ruby0.dump-stats filename = results.ruby opal0.listparam
opal0.stats
I meant, should I also be initializing opal1, opal2, opal3? Please comment.
Thanks in advance!
Lei
----- Original Message -----
Sent: Tuesday, March 20, 2007 5:13 PM
Subject: Re: [Gems-users] Performance evaluation of CMP with Ruby and Opal
On 3/20/07, Lei Yang <lya755@xxxxxxxxxxxxxxxxxxxx> wrote:
Thanks Liqun. I see. Indeed Ruby cycles is half of the Opal cycles. However, I found in results.opal similar stats for [0][1][2][3], are they meant for each processor (since in my system I simulated four CPUs)?
yup.
But as can be seen from below, how come [1] and [2] have more total number of instructions than what I've specified by "C 100000000" ?
I don't know. Timing-first simulator (TFsim/opal) will reexecuate some instructions if the results are different from the functional simulator. But 150% more instructions are way too many. Lei, please wait for Mike or Luke's reply.
Liqun
Is there a documentation of how to read the opal and ruby dump stats?
Thanks a lot!
Lei
[1] *** Runtime statistics: [1] Total number of instructions 255117523 [1] Total number of cycles 64421882 [1] number of continue calls 255117523
[1] Instruction per cycle: 3.96011 [1] Total Elapsed Time: 36060 sec 0 usec [1] Total Retirement Time: 3740 sec 262608 usec
[1] Approximate cycle per sec: 1786.52 [1] Approximate instructions per sec: 7074.79 [1] This processor's Simics overhead (retire/elapsed): 10.37% [1] Average number of instructions per continue
1.00
[2] *** Runtime statistics: [2] Total number of instructions 256142056 [2] Total number of cycles 64421882 [2] number of continue calls 256142056
[2] Instruction per cycle: 3.97601 [2] Total Elapsed Time: 36060 sec 0 usec [2] Total Retirement Time: 3713 sec 458821 usec
[2] Approximate cycle per sec: 1786.52 [2] Approximate instructions per sec: 7103.2 [2] This processor's Simics overhead (retire/elapsed): 10.30% [2] Average number of instructions per continue
1.00
[3] *** Runtime statistics: [3] Total number of instructions 40012500 [3] Total number of cycles 64421882 [3] number of continue calls 40012500
[3] Instruction per cycle: 0.621101 [3] Total Elapsed Time: 36060 sec 0 usec [3] Total Retirement Time: 600 sec 825168 usec
[3] Approximate cycle per sec: 1786.52 [3] Approximate instructions per sec: 1109.61 [3] This processor's Simics overhead (retire/elapsed): 1.67% [3] Average number of instructions per continue
1.0
----- Original Message -----
Sent: Tuesday, March 20, 2007 3:11 PM
Subject: Re: [Gems-users] Performance evaluation of CMP with Ruby and Opal
[0] *** Runtime statistics: [0] Total number of instructions 100000003
This is the number of instructions graduated, specified by "C 100000000"
[0] Total number of cycles 64421882
This is the opal cycles, if OPAL_RUBY_MULTIPLIER is 1, then this number should be equal to ruby cycles. But by default, OPAL_RUBY_MULTIPLIER is 2, so opal cycles should be twice of ruby cycles.
hope this helps. Liqun
[0] number of continue calls 100000003 [0] Instruction per cycle: 1.55227 [0] Total Elapsed Time: 36060 sec 0 usec
[0] Total Retirement Time: 1522 sec 670804 usec [0] Approximate cycle per sec: 1786.52 [0] Approximate instructions per sec: 2773.15
[0] This processor's Simics overhead (retire/elapsed): 4.22% [0] Average number of instructions per continue 1.00
----- Original Message -----
Sent: Tuesday, March 20, 2007 2:52 PM
Subject: Re: [Gems-users] Performance evaluation of CMP with Ruby and Opal
Just my 2 cents.
1. The simulation is very very slow and it seems impossible to run the entire benchmark. 10 million cycles cost me more than one hour. Although I can specify warm up length, it is best to cover the entire life span of the benchmark. Has anyone tried to use a sampling approach? I guess it's OK to wait on the completion of the entire benchmark when producing final performance numbers, but it certainly is a pain whenever there is modification to the code and we want to see how it affects the performance. GEMS users, how do you handle this problem?
Most studies use Opal only in the sensitivity analysis, say run for 100M instructions. You might consider use the techniques in SMARTS. I vaguely remember CMU folks have released this in SimFlex.
2. Exactly what performance number should I look at to compare two systems, when both Opal and Ruby are used. I saw on FAQ that one should use Ruby_cycles to measure the runtime of the simulated system. But when I let Opal run the same number of cycles, shouldn't Ruby_cycles be the same for both? If not, why?
Opal shows how many instructions graduated, not cycles.
Liqun
I appreciate your comments!
Thanks,
Lei _______________________________________________ Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users Use Google to search the GEMS Users mailing list by adding "site:
https://lists.cs.wisc.edu/archive/gems-users/" to your search.
|