[Gems-users] Getting Multiprocessor benchmarks running on multiple processors at once


Date: Thu, 7 Jul 2005 12:24:12 -0400 (EDT)
From: Sean Ryan Leventhal <sleventh@xxxxxxxxxxxx>
Subject: [Gems-users] Getting Multiprocessor benchmarks running on multiple processors at once
I have modified opal to print out traces of all memory instructions. I call a function of the sequencer within the execute stage of both memop objects. This function prints out the following:

m_local_cycles (which I am currently treating as the time)
the address of the instruction
whether it is a store
and the address being accessed.

Each sequencer has its own file.

When I merge these files, and sort them based on processor/sequencer I observe that there are long strings in which only one processor accesses the cache. For instance, I start fmm -p4 (fast multipole method on four processors from splash2), and do
c 1500000
to try to jump past some of the OS stuff. I then load ruby and opal and initialize them and run


opal0.sim-step 5000000

This produces several very large traces. But sorting them and grouping all adjacent memory accesses of the same processor as a single "string" yields only 32 "strings", with an average length of 103,827 memory accesses. In other words, it appears that two threads are never executing at the same time. I get similar behavior from fft. Does anyone have any idea what I am doing wrong?

- Sean

[← Prev in Thread] Current Thread [Next in Thread→]