Re: [Gems-users] MESI_SCMP_ protocol crush


Date: Tue, 1 Apr 2008 18:40:46 -0400
From: "Konstantinos Aisopos" <kaisopos@xxxxxxxxx>
Subject: Re: [Gems-users] MESI_SCMP_ protocol crush
I just tried it. I get the same assertion failure :(

-Kostas

On Tue, Apr 1, 2008 at 1:30 PM, Enrique Vallejo <enrique@xxxxxxxxxxxxx> wrote:
> Did you try defining BIGSET in $GEMS/ruby/common/Set.h? I haven't tried
> processor simulations, but there's a comment in that file:
>
> // Define this to use the BigSet class which is slower, but supports
> // sets of size larger than 32.
>
> // #define BIGSET
>
> Given that Netdests are defined using these sets, that's probably a source
> of conflict. Good luck,
>
> Enrique Vallejo
> University of Cantabria
> http://www.atc.unican.es/~~enrique/
>
>
> -----Mensaje original-----
> De: gems-users-bounces@xxxxxxxxxxx [mailto:gems-users-bounces@xxxxxxxxxxx]
> En nombre de Konstantinos Aisopos
> Enviado el: martes, 01 de abril de 2008 9:20
> Para: Gems Users
> Asunto: Re: [Gems-users] MESI_SCMP_ protocol crush
>
>
> I found some more properties of my problem: The problem is independent
> of the protocol (I have the same asserion failure when using
> MSI_MOSI_CMP_directory), but it is dependent on the number of nodes.
> When simulating 16 nodes (in tester or n ruby) everything works fine,
> but when I try 64, the assertion fails.
>
> The assertion has to do with the mapping of memory addresses to an L2
> tile. It's inside the function addSharer, and makes sure (whenever a
> sharer is added) that this address should indeed be mapped in this
> tile:
>
>  void addSharer(Address addr, MachineID requestor) {
>    DEBUG_EXPR(machineID);
>    DEBUG_EXPR(requestor);
>    DEBUG_EXPR(addr);
>    assert(map_L1CacheMachId_to_L2Cache(addr, requestor) ==
> machineID); <--FAILED
>    L2cacheMemory[addr].Sharers.add(requestor);
>  }
>
> I found the code that does the mapping but didn't really understand
> what's happening (I paste it at the end of this email)
>
> Huan,
>
> My topology is an 8x8 mesh, which I initally created with
> GarnetFileMaker.py, and then added the memory nodes manually.
>
> Mike,
>
> here's a trace of MSI_MOSI_CMP_directory, simulating 64 cores (
> parameters: -p 64 -e 64 -a 64 -m 64 -n FILE_SPECIFIED -l 1 -s 1)
>
> Request trace enabled to output file 'ruby.trace.gz'
>      2  46  -1        Seq               Begin       >       [0x62c0,
> line 0x62c0] ST
>      4  18  -1        Seq               Begin       >       [0x39c0,
> line 0x39c0] ST
>      6  31  -1        Seq               Begin       >       [0x2cc0,
> line 0x2cc0] ST
>      6   0  46    L1Cache               Store     NP>L1_IM  [0x62c0,
> line 0x62c0]
>      7   0  31    L1Cache               Store     NP>L1_IM  [0x2cc0,
> line 0x2cc0]
>      8  13  -1        Seq               Begin       >       [0x59c0,
> line 0x59c0] ST
>      8   0  18    L1Cache               Store     NP>L1_IM  [0x39c0,
> line 0x39c0]
>     10   0  13    L1Cache               Store     NP>L1_IM  [0x59c0,
> line 0x59c0]
>     10  51  -1        Seq               Begin       >       [0x3fc0,
> line 0x3fc0] ATOMIC
>     12  40  -1        Seq               Begin       >       [0x9c0,
> line 0x9c0] ST
>     13   0  51    L1Cache               Store     NP>L1_IM  [0x3fc0,
> line 0x3fc0]
>     14  49  -1        Seq               Begin       >       [0x60c0,
> line 0x60c0] ST
>     16   0  49    L1Cache               Store     NP>L1_IM  [0x60c0,
> line 0x60c0]
>     16   0  40    L1Cache               Store     NP>L1_IM  [0x9c0,
> line 0x9c0]
>     16  18  -1        Seq               Begin       >       [0x4c0,
> line 0x4c0] ST
>     18   2  -1        Seq               Begin       >       [0x56c0,
> line 0x56c0] ST
>     19   0  18    L1Cache               Store     NP>L1_IM  [0x4c0,
> line 0x4c0]
>     20  14  -1        Seq               Begin       >       [0x5dc0,
> line 0x5dc0] ST
>     21   0   2    L1Cache               Store     NP>L1_IM  [0x56c0,
> line 0x56c0]
>     22  35  -1        Seq               Begin       >       [0x52c0,
> line 0x52c0] ATOMIC
>     23   0  14    L1Cache               Store     NP>L1_IM  [0x5dc0,
> line 0x5dc0]
>     24   4  -1        Seq               Begin       >       [0x44c0,
> line 0x44c0] ST
> Runtime Error at ../protocols/MSI_MOSI_CMP_directory-L2cache.sm:275,
> Ruby Time: 24: assert failure, PID: 30232
> press return to continue.
>
> I m looking into how to print the machineID and requestor in the
> trace, but it seems that the *first time* the code reaches this
> assertion (the first time a sharer is added) the asserion fails, so it
> seems like a mapping problem, not a coherence protocol problem.
>
> thoughts?
>
> thanks a bunch for the help,
> -Kostas
>
> ------- mappng code: ruby/slicc_interface/RubySlicc_ComponentMapping.h------
>
> // input parameter is the base ruby node of the L1 cache
> // returns a value between 0 and total_L2_Caches_within_the_system
> inline
> MachineID map_L1CacheMachId_to_L2Cache(const Address& addr, MachineID
> L1CacheMachId)
> {
>  int L2bank = 0;
>  MachineID mach = {MACHINETYPE_L2CACHE_ENUM, 0};
>
>  if (RubyConfig::L2CachePerChipBits() > 0) {
>    if (MAP_L2BANKS_TO_LOWEST_BITS) {
>      L2bank = addr.bitSelect(RubyConfig::dataBlockBits(),
>
> RubyConfig::dataBlockBits()+RubyConfig::L2CachePerChipBits()-1);
>    } else {
>      L2bank =
> addr.bitSelect(RubyConfig::dataBlockBits()+L2_CACHE_NUM_SETS_BITS,
>
> RubyConfig::dataBlockBits()+L2_CACHE_NUM_SETS_BITS+RubyConfig::L2CachePerChi
> pBits()-1);
>    }
>  }
>
>  assert(L2bank < RubyConfig::numberOfL2CachePerChip());
>  assert(L2bank >= 0);
>
>  mach.num =
> RubyConfig::L1CacheNumToL2Base(L1CacheMachId.num)*RubyConfig::numberOfL2Cach
> ePerChip()
> // base #
>    + L2bank;  // bank #
>  assert(mach.num < RubyConfig::numberOfL2Cache());
>  return mach;
> }
>
>
> On Sun, Mar 30, 2008 at 10:22 PM, Mike Marty <mike.marty@xxxxxxxxx> wrote:
> > I have no idea why that assertion would be triggered.  I would print
> > out the machineID and requestor. See the wiki for generating a
> > protocol debug trace.  Grep on the block address that causes the
> > assertion.  Add extra debuggin information to the trace using
> > APPEND_TRANSITION_COMMENT and DEBUG_EXPR.
> >
> > --Mike
> >
> >
> > On Sun, Mar 30, 2008 at 3:27 PM, Konstantinos Aisopos
> >
> > <kaisopos@xxxxxxxxx> wrote:
> > > Hello again,
> > >
> > >  any ideas about my problem? any idea what this assertion prevents from
> > >  happening? Should I provide you more information? Does the MESI_SCMP
> > >  require any other parameters to be set that I don't know??
> > >
> > >  I thought it was a topology problem so I created the file:
> > >
> ruby/network/simple/Network_Files/NUCA_Procs-64_ProcsPerChip-64_L2Banks-64_M
> emories-64.txt
> > >  and set these parameters:
> > >  ruby0.setparam_str g_CACHE_DESIGN NUCA
> > >  ruby0.setparam_str g_NETWORK_TOPOLOGY FILE_SPECIFIED
> > >  ... the problem still persists. I got rid of opal to make the
> > >  simulation simpler. problem persists. Also, if i don't load ruby the
> > >  simulation works fine.
> > >
> > >  help please :P
> > >
> > >  -Kostas
> > >
> > >
> > >
> > >
> > >  On Thu, Mar 27, 2008 at 10:54 PM, Konstantinos Aisopos
> > >  <kaisopos@xxxxxxxxx> wrote:
> > >  > Hi list,
> > >  >
> > >  > I am using MESI_SCMP_bankdirectory protocol to simulate a 64core
> > >  > system. I haven't touched the protocol or the simulator. I am
> > >  > executing the following script:
> > >  >
> > >  > instruction-fetch-mode instruction-fetch-trace
> > >  > istc-disable
> > >  > dstc-disable
> > >  > cpu-switch-time 1
> > >  > load-module ruby
> > >  > load-module opal
> > >  > ruby0.setparam g_NUM_PROCESSORS 64
> > >  > ruby0.setparam g_PROCS_PER_CHIP 64
> > >  > ruby0.setparam g_NUM_L2_BANKS 64
> > >  > ruby0.setparam g_NUM_MEMORIES 64
> > >  > ruby0.setparam NUMBER_OF_VIRTUAL_NETWORKS 5
> > >  > ruby0.setparam g_MEMORY_SIZE_BYTES 4294967296
> > >  > ruby0.setparam g_endpoint_bandwidth 1000
> > >  > ruby0.init
> > >  > opal0.init
> > >  > opal0.sim-start "results.opal"
> > >  > opal0.sim-step 10000000000
> > >  >
> > >  > and i am getting the following error, when i execute "opal0.sim-step
> > >  > 10000000000":
> > >  >
> > >  > Runtime Error at ../protocols/MESI_SCMP_bankdirectory-L2cache.sm:224,
> > >  > Ruby Time: 23: assert failure, PID: 1335
> > >  >
> > >  > the 224 line is:
> > >  > assert(map_L1CacheMachId_to_L2Cache(addr,requestor) == machineID)
> > >  >
> > >  > any idea what might be wrong?
> > >  >
> > >  > thanks,
> > >  >
> > >  > Kostas
> > >  >
> >
> > >  _______________________________________________
> > >  Gems-users mailing list
> > >  Gems-users@xxxxxxxxxxx
> > >  https://lists.cs.wisc.edu/mailman/listinfo/gems-users
> > >  Use Google to search the GEMS Users mailing list by adding
> "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.
> > >
> > >
> > _______________________________________________
> > Gems-users mailing list
> > Gems-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/gems-users
> > Use Google to search the GEMS Users mailing list by adding
> "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.
> >
> >
> _______________________________________________
> Gems-users mailing list
> Gems-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/gems-users
> Use Google to search the GEMS Users mailing list by adding
> "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.
>
>
> _______________________________________________
> Gems-users mailing list
> Gems-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/gems-users
> Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.
>
>
[← Prev in Thread] Current Thread [Next in Thread→]