Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque)

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Date:	Fri, 12 Jan 2007 17:48:29 -0500
From:	Shougata Ghosh <shougata@xxxxxxxxxxxxx>
Subject:	Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque)

Hi Dan
Thanks for your quick reply.

I run the simulations with "ruby0.setparam_strREMOVE_SINGLE_CYCLE_DCACHE_FAST_PATH true".

Doesn't that mean FAST_PATH is disabled?

So when FAST_PATH is enabled, there is no way to tell fromruby_operate() if the request is a duplicate or not? Seems likemh_memorytracer_possible_cache_miss() will return 0 for both duplicaterequests as well as L1 cache hit.

Thanks
shougata

>The return value may be zero for normal hits in the l1 cache whenFAST_PATH is enabled... that could also explain some of your other issues.

>
>Shougata Ghosh wrote:
>> Thanks Jayaram and Dan for your replies. I was aware that simics sends
>> memory requests to ruby more than once. What I am doing inside

>> ruby_operate() is that I only record the transaction in my tracefile if

>> the return value of mh_memorytracer_possible_cache_miss(mem_op) is
>> non-zero. Does that sound ok?
>> Creating a processor set with pset_create and then binding the threads
>> to the cpus of this set kept out all the other processes from
>> interfering with my benchmark.
>> Thanks again
>> shougata
>>

>>>>> From: Dan Gibson <degibson@xxxxxxxx>

>>> Subject: Re: [Gems-users] Issues with collecting memory access trace
>>>     for logtm microbenchmark (tm-deque)
>>> To: Gems Users <gems-users@xxxxxxxxxxx>
>>> Message-ID: <45A6459B.3030301@xxxxxxxx>
>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>
>>> Shougata,
>>> Let me take a stab at the areas that I am confident in answering
>>> correctly (eg. NOT LogTM):
>>>
>>> **Too many requests:
>>> Simics uses an "are you sure?" policy when issuing memory requests to
>>> Ruby. That is, each request is passed to ruby *twice* -- once to
>>> determine the stall time, and once when the stall time is elapsed and
>>> Simics is verifying that Ruby wants the operation to complete. These

>>> dual requests are handled in SimicsProcessor.C -- for convenience(both>>> of C++ language and for the filtering effect) you may want to moveyour>>> trace generation higher into Ruby's hierarchy (say,SimicsProcessor.C or

>>> Sequencer.C).
>>>
>>> **ASI:

>>> Contrary to the name, the ASI is used to specify not the processbut the

>>> target address space. 128 is the vanilla address space for main
>>> memory... ASIs are detailed quite extensively in Sun's microprocessor
>>> manuals (google for "sun ultrasparc manual" and see the section on

>>> ASIs). For example, ASI 0x70 (aka 128) isASI_BLOCK_AS_IF_USER_PRIMARY,

>>> which is for user-level accesses to main memory, ASI 0x58 is for
>>> accesses to the data TLB, 0x59 is the data TSB, etc. There is not one
>>> ASI per process.
>>>
>>> Regards,
>>> Dan
>>>
>>> Shougata Ghosh wrote:
>>>

>>>>>>>>>>>>> Hi

>>>> I am simulating 16 processor ultrasparc-iii with solaris10. I loaded
>>>> ruby (no opal) with simics. The protocol I used was

>>>> MESI_SMP_LogTm_directory and I was running tm-deque microbenchmarkthat

>>>> comes with GEMS. My goal was to collect the memory traces (only data
>>>> access, no instruction access) of tm-deque and analyse the trace file
>>>> offline.
>>>> Let me first give a brief overview of how I collect the traces.
>>>>

>>>> I print the clock_cycle (simics cycle), the cpu making therequest, the>>>> physical address of the memory location, the type of access (r orw) and>>>> if this cpu is currently executing a xaction (logTm). The formatlooks

>>>> like this:
>>>>
>>>> cycle    cpu    phys_addr    type(r/w)    in_xaction
>>>>

>>>> This I print from inside ruby_operate() in ruby.c, since thisfunction

>>>> is called for every memory access simics makes.

>>>> In addition to this, in a different trace file, I print when axaction

>>>> begins, commits or aborts. This I print from
>>>> magic_instruction_callback() in commands.C. The format is following:
>>>>
>>>> cycle    cpu    xaction_type(B/C/A)    xaction_id(for nested xaction)
>>>>

>>>> Once the simulation is completed, I combine the two trace filesand sort

>>>> it with the clock cycle field.
>>>>
>>>> *****The biggest issue is with having too many requests. I want to

>>>> isolate all the other processes making memory requests, excepttm-deque.

>>>> Right now, I'm isolating the kernel requests by inspecting the priv

>>>> field in (v9_memory_transaction *) mem_op->priv. If the priv fieldis 1,

>>>> I don't record that transaction. I believe this effectively keeps the

>>>> kernel requests out of my trace. But there are othermaintenance/service>>>> processes started by the kernel running in user space which accessthe

>>>> memory and I want to isolate them. I have tried to detect the pid or
>>>> some sort of a process id from inside ruby but haven't had any
>>>> success/luck so far! Things I have looked into are:
>>>>
>>>> - The ASID (address space id) field in (v9_memory_transaction *)

>>>> mem_op->asi. This didn't work!! The ASID was a fixed 128throughout. One>>>> possible reason is that perhaps the ASID changes between userspace and>>>> kernel space. Since I'm only recording user-space accesses, Idon't see

>>>> any changes in ASID.
>>>>
>>>> - The content of global register g7. From inspecting the opensolaris
>>>> code, I noticed that the getpid() function gets the address of the

>>>> current_thread structure from %g7. It then gets a pointer to theprocess>>>> the current_thread belongs to from the current_thread structure.Next,>>>> it reads the process_id from the process structure. Since I don'tcare>>>> about the exact pid, I inspected the value of the %g7 register. Ididn't

>>>> see any changes in that! One possibility was ofcourse %g7 stores the
>>>> virtual address which could be the same for all processes. If all the

>>>> processes are running just one thread, this seemed very likely.So, next

>>>> I looked into the corresponding physical address. Unfortunately, that
>>>> remained constant as well!
>>>> I'll try reading the content of the memory location pointed to by the
>>>> physical address (thread_phys_addr). Maybe that will have a different
>>>> value! I am yet to look into that.
>>>>
>>>> On a side, how does LogTm differentiate xactional requests from
>>>> non-xactional ones if they both come from the same processor??
>>>>

>>>> *****My second issue is with the clock cycle I print fortimestamping. I

>>>> am using the SIM_clock_cycle to timestamp the memory accesses. When I
>>>> combine the two traces, I notice that after a xaction has begun,
>>>> subsequent memory accesses printed from ruby_operate() doesn't have
>>>> in_xaction set to 1! Here's an example of it:
>>>> 9067854    13    189086172    r    0
>>>> 9067856    13    185775464    w    0
>>>> 9068573    13    B    0            <- xaction begins
>>>> 9069382    13    185775464    w    0
>>>> 9069387    13    185775468    r    0
>>>> .
>>>> .
>>>> .
>>>> 9069558    13    185775468    w    0
>>>> 9069566    13    185775468    w    0

>>>> 9069611 13 185775272 r 1 <- first time in_xactionturns 1

>>>>
>>>> There's always a lag of about 1000 cycles between xaction Begin and

>>>> in_xaction turning into 1 in the memory access traces. I did makesure I>>>> set the cpu-switch-cycle to 1 in simics before I started mysimulations!

>>>> I get the value of in_xaction in the following way:
>>>> #define XACT_MGR

>>>>g_system_ptr->getChip(SIMICS_current_processor_number()/RubyConfig::numberOfProcsPerChip())->getTransactionManager(SIMICS_current_processor_number()%RubyConfig::numberOfProcsPerChip())

>>>> in_xaction = XACT_MGR->inTransaction();
>>>>

>>>> As I metioned earlier, I get the clock_cycle fromSIM_cycle_count(*cpu).

>>>> Any idea what could be causing this? Do you think I should try using
>>>> ruby_cycles instead?
>>>>

>>>> *****Third issue is specific to the LogTm microbenchmark I wasrunning.>>>> I was using the LogTm tm-deque microbenchmark. I ran it with 10threads

>>>> and set # of ops to 10. Initially I wanted small xactions without

>>>> conflicts. When I look at the trace file, I don't see anyinterleaving>>>> threads. The 10 threads ran one after the other in the followingorder:

>>>> thread        cpu    start_cycle
>>>> T1        13    9068573
>>>> T2        9    10035999
>>>> T3        13    10944933
>>>> T4        2    11654399
>>>> T5        9    11781161
>>>> T6        13    11886113
>>>> T7        4    16280785
>>>> T8        13    16495097
>>>> T9        0    16917327
>>>> T10        6    17562721
>>>>

>>>> Why aren't the threads running in parallel? The code dispatchesall 10>>>> threads in a for-loop and later does a thread_join. I amsimulating 16

>>>> processors - I expected all 10 threads to run in parallel! Also, the
>>>> number of clock cycles between the end of one thread and the start of
>>>> the enxt one is quite large - itvaried from 200,000 to 900,000!

>>>> Am I doing something wrong with the way I am collecting theclock_cycle

>>>> with SIM_cycle_count(current_cpu) ?
>>>>

>>>> I would really appreciate if anyone could share theirthoughts/ideas on

>>>> these issues.
>>>> Thanks a lot in advance.
>>>> -shougata
>>>>
>>>> _______________________________________________
>>>> Gems-users mailing list
>>>> Gems-users@xxxxxxxxxxx
>>>> https://lists.cs.wisc.edu/mailman/listinfo/gems-users

>>>> Use Google to search the GEMS Users mailing list by adding"site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.

>>>>
>>>>
>>>>
>>>>

>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________

>> Gems-users mailing list
>> Gems-users@xxxxxxxxxxx
>> https://lists.cs.wisc.edu/mailman/listinfo/gems-users

>> Use Google to search the GEMS Users mailing list by adding"site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.

>>
>>

>>

[← Prev in Thread]	Current Thread	[Next in Thread→]
[Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Shougata Ghosh Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Jayaram Bobba Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Dan Gibson <Possible follow-up(s)> Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Shougata Ghosh Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Dan Gibson Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Shougata Ghosh <= Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Dan Gibson

Previous by Date:	Re: [Gems-users] Invalidate directory entry, Lei Yang
Next by Date:	Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Dan Gibson
Previous by Thread:	Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Dan Gibson
Next by Thread:	Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque), Dan Gibson
Indexes:	[Date] [Thread]

Mailing List Archives

Authenticated access

Re: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque)