I found some more properties of my problem: The problem is independent
of the protocol (I have the same asserion failure when using
MSI_MOSI_CMP_directory), but it is dependent on the number of nodes.
When simulating 16 nodes (in tester or n ruby) everything works fine,
but when I try 64, the assertion fails.
The assertion has to do with the mapping of memory addresses to an L2
tile. It's inside the function addSharer, and makes sure (whenever a
sharer is added) that this address should indeed be mapped in this
tile:
void addSharer(Address addr, MachineID requestor) {
DEBUG_EXPR(machineID);
DEBUG_EXPR(requestor);
DEBUG_EXPR(addr);
assert(map_L1CacheMachId_to_L2Cache(addr, requestor) ==
machineID); <--FAILED
L2cacheMemory[addr].Sharers.add(requestor);
}
I found the code that does the mapping but didn't really understand
what's happening (I paste it at the end of this email)
Huan,
My topology is an 8x8 mesh, which I initally created with
GarnetFileMaker.py, and then added the memory nodes manually.
Mike,
here's a trace of MSI_MOSI_CMP_directory, simulating 64 cores (
parameters: -p 64 -e 64 -a 64 -m 64 -n FILE_SPECIFIED -l 1 -s 1)
Request trace enabled to output file 'ruby.trace.gz'
2 46 -1 Seq Begin > [0x62c0,
line 0x62c0] ST
4 18 -1 Seq Begin > [0x39c0,
line 0x39c0] ST
6 31 -1 Seq Begin > [0x2cc0,
line 0x2cc0] ST
6 0 46 L1Cache Store NP>L1_IM [0x62c0,
line 0x62c0]
7 0 31 L1Cache Store NP>L1_IM [0x2cc0,
line 0x2cc0]
8 13 -1 Seq Begin > [0x59c0,
line 0x59c0] ST
8 0 18 L1Cache Store NP>L1_IM [0x39c0,
line 0x39c0]
10 0 13 L1Cache Store NP>L1_IM [0x59c0,
line 0x59c0]
10 51 -1 Seq Begin > [0x3fc0,
line 0x3fc0] ATOMIC
12 40 -1 Seq Begin > [0x9c0,
line 0x9c0] ST
13 0 51 L1Cache Store NP>L1_IM [0x3fc0,
line 0x3fc0]
14 49 -1 Seq Begin > [0x60c0,
line 0x60c0] ST
16 0 49 L1Cache Store NP>L1_IM [0x60c0,
line 0x60c0]
16 0 40 L1Cache Store NP>L1_IM [0x9c0,
line 0x9c0]
16 18 -1 Seq Begin > [0x4c0,
line 0x4c0] ST
18 2 -1 Seq Begin > [0x56c0,
line 0x56c0] ST
19 0 18 L1Cache Store NP>L1_IM [0x4c0,
line 0x4c0]
20 14 -1 Seq Begin > [0x5dc0,
line 0x5dc0] ST
21 0 2 L1Cache Store NP>L1_IM [0x56c0,
line 0x56c0]
22 35 -1 Seq Begin > [0x52c0,
line 0x52c0] ATOMIC
23 0 14 L1Cache Store NP>L1_IM [0x5dc0,
line 0x5dc0]
24 4 -1 Seq Begin > [0x44c0,
line 0x44c0] ST
Runtime Error at ../protocols/MSI_MOSI_CMP_directory-L2cache.sm:275,
Ruby Time: 24: assert failure, PID: 30232
press return to continue.
I m looking into how to print the machineID and requestor in the
trace, but it seems that the *first time* the code reaches this
assertion (the first time a sharer is added) the asserion fails, so it
seems like a mapping problem, not a coherence protocol problem.
thoughts?
thanks a bunch for the help,
-Kostas
------- mappng code: ruby/slicc_interface/RubySlicc_ComponentMapping.h------
// input parameter is the base ruby node of the L1 cache
// returns a value between 0 and total_L2_Caches_within_the_system
inline
MachineID map_L1CacheMachId_to_L2Cache(const Address& addr, MachineID
L1CacheMachId)
{
int L2bank = 0;
MachineID mach = {MACHINETYPE_L2CACHE_ENUM, 0};
if (RubyConfig::L2CachePerChipBits() > 0) {
if (MAP_L2BANKS_TO_LOWEST_BITS) {
L2bank = addr.bitSelect(RubyConfig::dataBlockBits(),
RubyConfig::dataBlockBits()+RubyConfig::L2CachePerChipBits()-1);
} else {
L2bank = addr.bitSelect(RubyConfig::dataBlockBits()+L2_CACHE_NUM_SETS_BITS,
RubyConfig::dataBlockBits()+L2_CACHE_NUM_SETS_BITS+RubyConfig::L2CachePerChipBits()-1);
}
}
assert(L2bank < RubyConfig::numberOfL2CachePerChip());
assert(L2bank >= 0);
mach.num = RubyConfig::L1CacheNumToL2Base(L1CacheMachId.num)*RubyConfig::numberOfL2CachePerChip()
// base #
+ L2bank; // bank #
assert(mach.num < RubyConfig::numberOfL2Cache());
return mach;
}
On Sun, Mar 30, 2008 at 10:22 PM, Mike Marty <mike.marty@xxxxxxxxx> wrote:
> I have no idea why that assertion would be triggered. I would print
> out the machineID and requestor. See the wiki for generating a
> protocol debug trace. Grep on the block address that causes the
> assertion. Add extra debuggin information to the trace using
> APPEND_TRANSITION_COMMENT and DEBUG_EXPR.
>
> --Mike
>
>
> On Sun, Mar 30, 2008 at 3:27 PM, Konstantinos Aisopos
>
> <kaisopos@xxxxxxxxx> wrote:
> > Hello again,
> >
> > any ideas about my problem? any idea what this assertion prevents from
> > happening? Should I provide you more information? Does the MESI_SCMP
> > require any other parameters to be set that I don't know??
> >
> > I thought it was a topology problem so I created the file:
> > ruby/network/simple/Network_Files/NUCA_Procs-64_ProcsPerChip-64_L2Banks-64_Memories-64.txt
> > and set these parameters:
> > ruby0.setparam_str g_CACHE_DESIGN NUCA
> > ruby0.setparam_str g_NETWORK_TOPOLOGY FILE_SPECIFIED
> > ... the problem still persists. I got rid of opal to make the
> > simulation simpler. problem persists. Also, if i don't load ruby the
> > simulation works fine.
> >
> > help please :P
> >
> > -Kostas
> >
> >
> >
> >
> > On Thu, Mar 27, 2008 at 10:54 PM, Konstantinos Aisopos
> > <kaisopos@xxxxxxxxx> wrote:
> > > Hi list,
> > >
> > > I am using MESI_SCMP_bankdirectory protocol to simulate a 64core
> > > system. I haven't touched the protocol or the simulator. I am
> > > executing the following script:
> > >
> > > instruction-fetch-mode instruction-fetch-trace
> > > istc-disable
> > > dstc-disable
> > > cpu-switch-time 1
> > > load-module ruby
> > > load-module opal
> > > ruby0.setparam g_NUM_PROCESSORS 64
> > > ruby0.setparam g_PROCS_PER_CHIP 64
> > > ruby0.setparam g_NUM_L2_BANKS 64
> > > ruby0.setparam g_NUM_MEMORIES 64
> > > ruby0.setparam NUMBER_OF_VIRTUAL_NETWORKS 5
> > > ruby0.setparam g_MEMORY_SIZE_BYTES 4294967296
> > > ruby0.setparam g_endpoint_bandwidth 1000
> > > ruby0.init
> > > opal0.init
> > > opal0.sim-start "results.opal"
> > > opal0.sim-step 10000000000
> > >
> > > and i am getting the following error, when i execute "opal0.sim-step
> > > 10000000000":
> > >
> > > Runtime Error at ../protocols/MESI_SCMP_bankdirectory-L2cache.sm:224,
> > > Ruby Time: 23: assert failure, PID: 1335
> > >
> > > the 224 line is:
> > > assert(map_L1CacheMachId_to_L2Cache(addr,requestor) == machineID)
> > >
> > > any idea what might be wrong?
> > >
> > > thanks,
> > >
> > > Kostas
> > >
>
> > _______________________________________________
> > Gems-users mailing list
> > Gems-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/gems-users
> > Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
> >
> >
> _______________________________________________
> Gems-users mailing list
> Gems-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/gems-users
> Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
>
>
|