Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque)

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Date:	Thu, 11 Jan 2007 08:11:39 -0600
From:	Dan Gibson <degibson@xxxxxxxx>
Subject:	Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque)

Shougata,

Let me take a stab at the areas that I am confident in answeringcorrectly (eg. NOT LogTM):


**Too many requests:

Simics uses an "are you sure?" policy when issuing memory requests toRuby. That is, each request is passed to ruby *twice* -- once todetermine the stall time, and once when the stall time is elapsed andSimics is verifying that Ruby wants the operation to complete. Thesedual requests are handled in SimicsProcessor.C -- for convenience (bothof C++ language and for the filtering effect) you may want to move yourtrace generation higher into Ruby's hierarchy (say, SimicsProcessor.C orSequencer.C).


**ASI:

Contrary to the name, the ASI is used to specify not the process but thetarget address space. 128 is the vanilla address space for mainmemory... ASIs are detailed quite extensively in Sun's microprocessormanuals (google for "sun ultrasparc manual" and see the section onASIs). For example, ASI 0x70 (aka 128) is ASI_BLOCK_AS_IF_USER_PRIMARY,which is for user-level accesses to main memory, ASI 0x58 is foraccesses to the data TLB, 0x59 is the data TSB, etc. There is not oneASI per process.


Regards,
Dan

Shougata Ghosh wrote:

Hi
I am simulating 16 processor ultrasparc-iii with solaris10. I loadedruby (no opal) with simics. The protocol I used wasMESI_SMP_LogTm_directory and I was running tm-deque microbenchmark thatcomes with GEMS. My goal was to collect the memory traces (only dataaccess, no instruction access) of tm-deque and analyse the trace fileoffline.
Let me first give a brief overview of how I collect the traces.
I print the clock_cycle (simics cycle), the cpu making the request, thephysical address of the memory location, the type of access (r or w) andif this cpu is currently executing a xaction (logTm). The format lookslike this:
cycle    cpu    phys_addr    type(r/w)    in_xaction
This I print from inside ruby_operate() in ruby.c, since this functionis called for every memory access simics makes.In addition to this, in a different trace file, I print when a xactionbegins, commits or aborts. This I print frommagic_instruction_callback() in commands.C. The format is following:
cycle    cpu    xaction_type(B/C/A)    xaction_id(for nested xaction)
Once the simulation is completed, I combine the two trace files and sortit with the clock cycle field.
*****The biggest issue is with having too many requests. I want toisolate all the other processes making memory requests, except tm-deque.Right now, I'm isolating the kernel requests by inspecting the privfield in (v9_memory_transaction *) mem_op->priv. If the priv field is 1,I don't record that transaction. I believe this effectively keeps thekernel requests out of my trace. But there are other maintenance/serviceprocesses started by the kernel running in user space which access thememory and I want to isolate them. I have tried to detect the pid orsome sort of a process id from inside ruby but haven't had anysuccess/luck so far! Things I have looked into are:
- The ASID (address space id) field in (v9_memory_transaction *)mem_op->asi. This didn't work!! The ASID was a fixed 128 throughout. Onepossible reason is that perhaps the ASID changes between user space andkernel space. Since I'm only recording user-space accesses, I don't seeany changes in ASID.
- The content of global register g7. From inspecting the opensolariscode, I noticed that the getpid() function gets the address of thecurrent_thread structure from %g7. It then gets a pointer to the processthe current_thread belongs to from the current_thread structure. Next,it reads the process_id from the process structure. Since I don't careabout the exact pid, I inspected the value of the %g7 register. I didn'tsee any changes in that! One possibility was ofcourse %g7 stores thevirtual address which could be the same for all processes. If all theprocesses are running just one thread, this seemed very likely. So, nextI looked into the corresponding physical address. Unfortunately, thatremained constant as well!I'll try reading the content of the memory location pointed to by thephysical address (thread_phys_addr). Maybe that will have a differentvalue! I am yet to look into that.
On a side, how does LogTm differentiate xactional requests fromnon-xactional ones if they both come from the same processor??
*****My second issue is with the clock cycle I print for timestamping. Iam using the SIM_clock_cycle to timestamp the memory accesses. When Icombine the two traces, I notice that after a xaction has begun,subsequent memory accesses printed from ruby_operate() doesn't havein_xaction set to 1! Here's an example of it:
9067854    13    189086172    r    0
9067856    13    185775464    w    0
9068573    13    B    0            <- xaction begins
9069382    13    185775464    w    0
9069387    13    185775468    r    0
.
.
.
9069558    13    185775468    w    0
9069566    13    185775468    w    0
9069611    13    185775272    r    1    <- first time in_xaction turns 1
There's always a lag of about 1000 cycles between xaction Begin andin_xaction turning into 1 in the memory access traces. I did make sure Iset the cpu-switch-cycle to 1 in simics before I started my simulations!I get the value of in_xaction in the following way:#define XACT_MGRg_system_ptr->getChip(SIMICS_current_processor_number()/RubyConfig::numberOfProcsPerChip())->getTransactionManager(SIMICS_current_processor_number()%RubyConfig::numberOfProcsPerChip())
in_xaction = XACT_MGR->inTransaction();
As I metioned earlier, I get the clock_cycle from SIM_cycle_count(*cpu).Any idea what could be causing this? Do you think I should try usingruby_cycles instead?
*****Third issue is specific to the LogTm microbenchmark I was running.I was using the LogTm tm-deque microbenchmark. I ran it with 10 threadsand set # of ops to 10. Initially I wanted small xactions withoutconflicts. When I look at the trace file, I don't see any interleavingthreads. The 10 threads ran one after the other in the following order:
thread        cpu    start_cycle
T1        13    9068573
T2        9    10035999
T3        13    10944933
T4        2    11654399
T5        9    11781161
T6        13    11886113
T7        4    16280785
T8        13    16495097
T9        0    16917327
T10        6    17562721
Why aren't the threads running in parallel? The code dispatches all 10threads in a for-loop and later does a thread_join. I am simulating 16processors - I expected all 10 threads to run in parallel! Also, thenumber of clock cycles between the end of one thread and the start ofthe enxt one is quite large - itvaried from 200,000 to 900,000!Am I doing something wrong with the way I am collecting the clock_cyclewith SIM_cycle_count(current_cpu) ?
I would really appreciate if anyone could share their thoughts/ideas onthese issues.
Thanks a lot in advance.
-shougata

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.

[← Prev in Thread]	Current Thread	[Next in Thread→]
[Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Shougata Ghosh Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Jayaram Bobba Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Dan Gibson <= <Possible follow-up(s)> Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Shougata Ghosh Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Dan Gibson Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Shougata Ghosh Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Dan Gibson

Previous by Date:	[Gems-users] Configure Simics for host, fortest777
Next by Date:	Re: [Gems-users] Invalidate directory entry, Mike Marty
Previous by Thread:	Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Jayaram Bobba
Next by Thread:	Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Shougata Ghosh
Indexes:	[Date] [Thread]

Mailing List Archives

Authenticated access

Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque)