[Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque)


Date: Wed, 10 Jan 2007 17:56:47 -0500
From: Shougata Ghosh <shougata@xxxxxxxxxxxxx>
Subject: [Gems-users] Issues with collecting memory access trace for logtm microbenchmark (tm-deque)
Hi
I am simulating 16 processor ultrasparc-iii with solaris10. I loaded ruby (no opal) with simics. The protocol I used was MESI_SMP_LogTm_directory and I was running tm-deque microbenchmark that comes with GEMS. My goal was to collect the memory traces (only data access, no instruction access) of tm-deque and analyse the trace file offline.
Let me first give a brief overview of how I collect the traces.

I print the clock_cycle (simics cycle), the cpu making the request, the physical address of the memory location, the type of access (r or w) and if this cpu is currently executing a xaction (logTm). The format looks like this:

cycle    cpu    phys_addr    type(r/w)    in_xaction

This I print from inside ruby_operate() in ruby.c, since this function is called for every memory access simics makes. In addition to this, in a different trace file, I print when a xaction begins, commits or aborts. This I print from magic_instruction_callback() in commands.C. The format is following:

cycle    cpu    xaction_type(B/C/A)    xaction_id(for nested xaction)

Once the simulation is completed, I combine the two trace files and sort it with the clock cycle field.

*****The biggest issue is with having too many requests. I want to isolate all the other processes making memory requests, except tm-deque. Right now, I'm isolating the kernel requests by inspecting the priv field in (v9_memory_transaction *) mem_op->priv. If the priv field is 1, I don't record that transaction. I believe this effectively keeps the kernel requests out of my trace. But there are other maintenance/service processes started by the kernel running in user space which access the memory and I want to isolate them. I have tried to detect the pid or some sort of a process id from inside ruby but haven't had any success/luck so far! Things I have looked into are:

- The ASID (address space id) field in (v9_memory_transaction *) mem_op->asi. This didn't work!! The ASID was a fixed 128 throughout. One possible reason is that perhaps the ASID changes between user space and kernel space. Since I'm only recording user-space accesses, I don't see any changes in ASID.

- The content of global register g7. From inspecting the opensolaris code, I noticed that the getpid() function gets the address of the current_thread structure from %g7. It then gets a pointer to the process the current_thread belongs to from the current_thread structure. Next, it reads the process_id from the process structure. Since I don't care about the exact pid, I inspected the value of the %g7 register. I didn't see any changes in that! One possibility was ofcourse %g7 stores the virtual address which could be the same for all processes. If all the processes are running just one thread, this seemed very likely. So, next I looked into the corresponding physical address. Unfortunately, that remained constant as well! I'll try reading the content of the memory location pointed to by the physical address (thread_phys_addr). Maybe that will have a different value! I am yet to look into that.

On a side, how does LogTm differentiate xactional requests from non-xactional ones if they both come from the same processor??

*****My second issue is with the clock cycle I print for timestamping. I am using the SIM_clock_cycle to timestamp the memory accesses. When I combine the two traces, I notice that after a xaction has begun, subsequent memory accesses printed from ruby_operate() doesn't have in_xaction set to 1! Here's an example of it:
9067854    13    189086172    r    0
9067856    13    185775464    w    0
9068573    13    B    0            <- xaction begins
9069382    13    185775464    w    0
9069387    13    185775468    r    0
.
.
.
9069558    13    185775468    w    0
9069566    13    185775468    w    0
9069611    13    185775272    r    1    <- first time in_xaction turns 1

There's always a lag of about 1000 cycles between xaction Begin and in_xaction turning into 1 in the memory access traces. I did make sure I set the cpu-switch-cycle to 1 in simics before I started my simulations! I get the value of in_xaction in the following way: #define XACT_MGR g_system_ptr->getChip(SIMICS_current_processor_number()/RubyConfig::numberOfProcsPerChip())->getTransactionManager(SIMICS_current_processor_number()%RubyConfig::numberOfProcsPerChip())
in_xaction = XACT_MGR->inTransaction();

As I metioned earlier, I get the clock_cycle from SIM_cycle_count(*cpu). Any idea what could be causing this? Do you think I should try using ruby_cycles instead?

*****Third issue is specific to the LogTm microbenchmark I was running. I was using the LogTm tm-deque microbenchmark. I ran it with 10 threads and set # of ops to 10. Initially I wanted small xactions without conflicts. When I look at the trace file, I don't see any interleaving threads. The 10 threads ran one after the other in the following order:
thread        cpu    start_cycle
T1        13    9068573
T2        9    10035999
T3        13    10944933
T4        2    11654399
T5        9    11781161
T6        13    11886113
T7        4    16280785
T8        13    16495097
T9        0    16917327
T10        6    17562721

Why aren't the threads running in parallel? The code dispatches all 10 threads in a for-loop and later does a thread_join. I am simulating 16 processors - I expected all 10 threads to run in parallel! Also, the number of clock cycles between the end of one thread and the start of the enxt one is quite large - itvaried from 200,000 to 900,000! Am I doing something wrong with the way I am collecting the clock_cycle with SIM_cycle_count(current_cpu) ?

I would really appreciate if anyone could share their thoughts/ideas on these issues.
Thanks a lot in advance.
-shougata

[← Prev in Thread] Current Thread [Next in Thread→]