Re: [Gems-users] Performance evaluation of CMP with Ruby and Opal

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Date:	Tue, 20 Mar 2007 23:46:48 -0500
From:	"Lei Yang" <lya755@xxxxxxxxxxxxxxxxxxxx>
Subject:	Re: [Gems-users] Performance evaluation of CMP with Ruby and Opal

Could it be that I only used one opal module for 4 processors? I basically just copied the procedure from GEMS quickstart, which gives an example of loading 1p solaris simulation. Below is my script:

read-configuration ../../checkpoints/bagle-4p-mcf-art.conf
instruction-fetch-mode instruction-fetch-trace
istc-disable
dstc-disable
cpu-switch-time 1
load-module ruby
load-module opal
ruby0.setparam g_NUM_PROCESSORS 4
ruby0.setparam g_PROCS_PER_CHIP 1
ruby0.init
opal0.init
opal0.sim-start "results.opal"
opal0.sim-step 100000000
ruby0.dump-stats filename = results.ruby
opal0.listparam
opal0.stats

I meant, should I also be initializing opal1, opal2, opal3? Please comment.

Thanks in advance!

Lei

----- Original Message -----

From: Liqun Cheng

To: Lei Yang

Sent: Tuesday, March 20, 2007 5:13 PM

Subject: Re: [Gems-users] Performance evaluation of CMP with Ruby and Opal

On 3/20/07, Lei Yang <lya755@xxxxxxxxxxxxxxxxxxxx> wrote:

Thanks Liqun. I see. Indeed Ruby cycles is half of the Opal cycles. However, I found in results.opal similar stats for [0][1][2][3], are they meant for each processor (since in my system I simulated four CPUs)?

yup.

But as can be seen from below, how come [1] and [2] have more total number of instructions than what I've specified by "C 100000000" ?

I don't know. Timing-first simulator (TFsim/opal) will reexecuate some instructions if the results are different from the functional simulator. But 150% more instructions are way too many. Lei, please wait for Mike or Luke's reply.

Liqun

Is there a documentation of how to read the opal and ruby dump stats?

Thanks a lot!

Lei

[1] *** Runtime statistics:
[1]   Total number of instructions                        255117523
[1]   Total number of cycles                               64421882
[1]   number of continue calls                            255117523
[1]   Instruction per cycle:                             3.96011
[1]   Total Elapsed Time:                                36060 sec 0 usec
[1]   Total Retirement Time:                             3740 sec 262608 usec
[1]   Approximate cycle per sec:                         1786.52
[1]   Approximate instructions per sec:                  7074.79
[1]   This processor's Simics overhead (retire/elapsed): 10.37%
[1]   Average number of instructions per continue          1.00

[2] *** Runtime statistics:
[2]   Total number of instructions                        256142056
[2]   Total number of cycles                               64421882
[2]   number of continue calls                            256142056
[2]   Instruction per cycle:                             3.97601
[2]   Total Elapsed Time:                                36060 sec 0 usec
[2]   Total Retirement Time:                             3713 sec 458821 usec
[2]   Approximate cycle per sec:                         1786.52
[2]   Approximate instructions per sec:                  7103.2
[2]   This processor's Simics overhead (retire/elapsed): 10.30%
[2]   Average number of instructions per continue          1.00

[3] *** Runtime statistics:
[3] Total number of instructions                         40012500
[3]   Total number of cycles                               64421882
[3]   number of continue calls                             40012500
[3] Instruction per cycle:                             0.621101
[3]   Total Elapsed Time:                                36060 sec 0 usec
[3] Total Retirement Time:                             600 sec 825168 usec
[3]   Approximate cycle per sec:                         1786.52
[3] Approximate instructions per sec:                  1109.61
[3] This processor's Simics overhead (retire/elapsed):   1.67%
[3] Average number of instructions per continue          1.0

----- Original Message -----

From: Liqun Cheng

To: Lei Yang

Sent: Tuesday, March 20, 2007 3:11 PM

Subject: Re: [Gems-users] Performance evaluation of CMP with Ruby and Opal

[0] *** Runtime statistics:
[0]   Total number of instructions                        100000003

This is the number of instructions graduated, specified by "C 100000000"

[0]   Total number of cycles                               64421882

This is the opal cycles, if OPAL_RUBY_MULTIPLIER is 1, then this number should be equal to ruby cycles. But by default, OPAL_RUBY_MULTIPLIER is 2, so opal cycles should be twice of ruby cycles.

hope this helps.
Liqun

[0]   number of continue calls                            100000003
[0]   Instruction per cycle:                             1.55227
[0]   Total Elapsed Time:                                36060 sec 0 usec
[0]   Total Retirement Time:                             1522 sec 670804 usec
[0]   Approximate cycle per sec:                         1786.52
[0]   Approximate instructions per sec:                  2773.15
[0]   This processor's Simics overhead (retire/elapsed):   4.22%
[0]   Average number of instructions per continue          1.00

----- Original Message -----

From: Liqun Cheng

To: Lei Yang ; Gems Users

Sent: Tuesday, March 20, 2007 2:52 PM

Subject: Re: [Gems-users] Performance evaluation of CMP with Ruby and Opal

Just my 2 cents.

1. The simulation is very very slow and it seems impossible to run the entire benchmark. 10 million cycles cost me more than one hour. Although I can specify warm up length, it is best to cover the entire life span of the benchmark. Has anyone tried to use a sampling approach? I guess it's OK to wait on the completion of the entire benchmark when producing final performance numbers, but it certainly is a pain whenever there is modification to the code and we want to see how it affects the performance. GEMS users, how do you handle this problem?

Most studies use Opal only in the sensitivity analysis, say run for 100M instructions. You might consider use the techniques in SMARTS. I vaguely remember CMU folks have released this in SimFlex.

2. Exactly what performance number should I look at to compare two systems, when both Opal and Ruby are used. I saw on FAQ that one should use Ruby_cycles to measure the runtime of the simulated system. But when I let Opal run the same number of cycles, shouldn't Ruby_cycles be the same for both? If not, why?

Opal shows how many instructions graduated, not cycles.

Liqun

I appreciate your comments!

Thanks,

Lei

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site: https://lists.cs.wisc.edu/archive/gems-users/" to your search.

[← Prev in Thread]	Current Thread	[Next in Thread→]
[Gems-users] Performance evaluation of CMP with Ruby and Opal, Lei Yang Re: [Gems-users] Performance evaluation of CMP with Ruby and Opal, Liqun Cheng Message not available Message not available Re: [Gems-users] Performance evaluation of CMP with Ruby and Opal, Lei Yang <Possible follow-up(s)> Re: [Gems-users] Performance evaluation of CMP with Ruby and Opal, Lei Yang <= Re: [Gems-users] Performance evaluation of CMP with Ruby and Opal, Mike Marty Re: [Gems-users] Performance evaluation of CMP with Ruby and Opal, Lei Yang

Previous by Date:	Re: [Gems-users] debugging, Mike Marty
Next by Date:	[Gems-users] A problem when dumping, Li Shengmei
Previous by Thread:	Re: [Gems-users] Performance evaluation of CMP with Ruby and Opal, Lei Yang
Next by Thread:	Re: [Gems-users] Performance evaluation of CMP with Ruby and Opal, Mike Marty
Indexes:	[Date] [Thread]

Mailing List Archives

Authenticated access

Re: [Gems-users] Performance evaluation of CMP with Ruby and Opal