Hi
I am simulating 16 processor ultrasparc-iii with solaris10. I loaded
ruby (no opal) with simics. The protocol I used was
MESI_SMP_LogTm_directory and I was running tm-deque microbenchmark that
comes with GEMS. My goal was to collect the memory traces (only data
access, no instruction access) of tm-deque and analyse the trace file
offline.
Let me first give a brief overview of how I collect the traces.
I print the clock_cycle (simics cycle), the cpu making the request, the
physical address of the memory location, the type of access (r or w) and
if this cpu is currently executing a xaction (logTm). The format looks
like this:
cycle cpu phys_addr type(r/w) in_xaction
This I print from inside ruby_operate() in ruby.c, since this function
is called for every memory access simics makes.
In addition to this, in a different trace file, I print when a xaction
begins, commits or aborts. This I print from
magic_instruction_callback() in commands.C. The format is following:
cycle cpu xaction_type(B/C/A) xaction_id(for nested xaction)
Once the simulation is completed, I combine the two trace files and sort
it with the clock cycle field.
*****The biggest issue is with having too many requests. I want to
isolate all the other processes making memory requests, except tm-deque.
Right now, I'm isolating the kernel requests by inspecting the priv
field in (v9_memory_transaction *) mem_op->priv. If the priv field is 1,
I don't record that transaction. I believe this effectively keeps the
kernel requests out of my trace. But there are other maintenance/service
processes started by the kernel running in user space which access the
memory and I want to isolate them. I have tried to detect the pid or
some sort of a process id from inside ruby but haven't had any
success/luck so far! Things I have looked into are:
- The ASID (address space id) field in (v9_memory_transaction *)
mem_op->asi. This didn't work!! The ASID was a fixed 128 throughout. One
possible reason is that perhaps the ASID changes between user space and
kernel space. Since I'm only recording user-space accesses, I don't see
any changes in ASID.
- The content of global register g7. From inspecting the opensolaris
code, I noticed that the getpid() function gets the address of the
current_thread structure from %g7. It then gets a pointer to the process
the current_thread belongs to from the current_thread structure. Next,
it reads the process_id from the process structure. Since I don't care
about the exact pid, I inspected the value of the %g7 register. I didn't
see any changes in that! One possibility was ofcourse %g7 stores the
virtual address which could be the same for all processes. If all the
processes are running just one thread, this seemed very likely. So, next
I looked into the corresponding physical address. Unfortunately, that
remained constant as well!
I'll try reading the content of the memory location pointed to by the
physical address (thread_phys_addr). Maybe that will have a different
value! I am yet to look into that.
On a side, how does LogTm differentiate xactional requests from
non-xactional ones if they both come from the same processor??
*****My second issue is with the clock cycle I print for timestamping. I
am using the SIM_clock_cycle to timestamp the memory accesses. When I
combine the two traces, I notice that after a xaction has begun,
subsequent memory accesses printed from ruby_operate() doesn't have
in_xaction set to 1! Here's an example of it:
9067854 13 189086172 r 0
9067856 13 185775464 w 0
9068573 13 B 0 <- xaction begins
9069382 13 185775464 w 0
9069387 13 185775468 r 0
.
.
.
9069558 13 185775468 w 0
9069566 13 185775468 w 0
9069611 13 185775272 r 1 <- first time in_xaction turns 1
There's always a lag of about 1000 cycles between xaction Begin and
in_xaction turning into 1 in the memory access traces. I did make sure I
set the cpu-switch-cycle to 1 in simics before I started my simulations!
I get the value of in_xaction in the following way:
#define XACT_MGR
g_system_ptr->getChip(SIMICS_current_processor_number()/RubyConfig::numberOfProcsPerChip())->getTransactionManager(SIMICS_current_processor_number()%RubyConfig::numberOfProcsPerChip())
in_xaction = XACT_MGR->inTransaction();
As I metioned earlier, I get the clock_cycle from SIM_cycle_count(*cpu).
Any idea what could be causing this? Do you think I should try using
ruby_cycles instead?
*****Third issue is specific to the LogTm microbenchmark I was running.
I was using the LogTm tm-deque microbenchmark. I ran it with 10 threads
and set # of ops to 10. Initially I wanted small xactions without
conflicts. When I look at the trace file, I don't see any interleaving
threads. The 10 threads ran one after the other in the following order:
thread cpu start_cycle
T1 13 9068573
T2 9 10035999
T3 13 10944933
T4 2 11654399
T5 9 11781161
T6 13 11886113
T7 4 16280785
T8 13 16495097
T9 0 16917327
T10 6 17562721
Why aren't the threads running in parallel? The code dispatches all 10
threads in a for-loop and later does a thread_join. I am simulating 16
processors - I expected all 10 threads to run in parallel! Also, the
number of clock cycles between the end of one thread and the start of
the enxt one is quite large - itvaried from 200,000 to 900,000!
Am I doing something wrong with the way I am collecting the clock_cycle
with SIM_cycle_count(current_cpu) ?
I would really appreciate if anyone could share their thoughts/ideas on
these issues.
Thanks a lot in advance.
-shougata
|