Re: [Gems-users] Trouble with MOESI_CMP_NUCA


Date: Mon, 10 Apr 2006 09:01:05 -0500 (CDT)
From: Bradford Beckmann <beckmann@xxxxxxxxxxx>
Subject: Re: [Gems-users] Trouble with MOESI_CMP_NUCA
Steve,

OK it looks like it is the latter, i.e. "something really weird is
happening in the L2 cache mapping functions".  There is multiple problems
with the debug output you sent:

1.  The out_msg.Destination should have 15 L2 cache bank destinations set
for the (16 - 1) cache banks that still needed to be search.  I believe
your DNUCA configuration was 256 banks with 16 banks per bank set, right?
One problem is only 14 destinations are set in "out_msg.Destination".

2.  "out_msg.RequestsPerRound" should equal the number of ones counted in
"out_msg.Destination", which is 14.  Instead this value is 7.

I believe there must be either a memory corruption problem or a bug in
Set.h.  Can you run valgrind on just a few steps of your simulation?  Also
go to ruby/common/Set.h and comment out the line "#define OPTBIGSET".  By
default, Set uses our optimized implementation, but there may be a problem
with your compiler not defining "__32BITS__" correctly.  If you comment
the line out, ruby will use our slower, but safer Set implementation.

Brad



On Sat, 8 Apr 2006, Steve Barrus wrote:

> I inadvertently modified the check point I was working with so I am
> unable to get that same trace.  Here is a different trace with the
> same problem.  I also including the extra debug info for the following
> transition.  Hopefully, that will provide you with the information
> that you are looking for.
>
>  398531   0   2  Collector        Issue_L2_Get  Col_P>Col_P [0x22e9c0, line 0x22e9c0]
>
> Thanks again for you help.
>
> -Steve
>
> On Fri, Apr 07, 2006 at 06:03:38PM -0500, Bradford Beckmann wrote:
> >
> > That's right I did put an assertion there.  Good so you have
> > COLLECTOR_HANDLES_OFF_CHIP_REQUESTS set to true.
> >
> > Well on a second, closer look at your debug output, I'm afraid this may be
> > a more complicated problem.  The problem is that the L1_GETS and
> > PERSISTENT_GETS requests are not being received by all L2 Caches.  This
> > leads me to believe that the network is clogged or something really weird
> > is happening in the L2 cache mapping functions.  This is really confusing
> > because the protocol works fine when I run it.
> >
> > I'm going to need more debugging information.  Please do the following:
> >
> > - In the action "s_sendSecondPhaseRequest" in the file
> > MOESI_CMP_NUCA-col.sm add the following two lines:
> >
> >         out_msg.Destination := in_msg.RequestDest;
> >         out_msg.RequestsPerRound := out_msg.Destination.count();
> >         DEBUG_EXPR(out_msg.Destination);       // <-add line
> > 	DEBUG_EXPR(out_msg.RequestsPerRound);  // <-add line
> >         out_msg.RetryNum := in_msg.RetryNum;
> >         out_msg.MessageSize := in_msg.MessageSize;
> >
> > - Then in simics run the following two commands before the 'c' command
> >
> > ruby0.debug-verb high
> > ruby0.debug-filter l
> >
> > - Send me the debug output for this transition:
> >
> > 314527   0   3  Collector        Issue_L2_Get  Col_P>Col_P
> > [0x30a840, line 0x30a840]
> >
> > Brad
> >
>

[← Prev in Thread] Current Thread [Next in Thread→]