Re: [Gems-users] Debugging SMP protocol with tester.exec


Date: Sun, 7 Mar 2010 11:32:52 -0500
From: Edward Lee <edwl202@xxxxxxxxx>
Subject: Re: [Gems-users] Debugging SMP protocol with tester.exec
When I compare the results of Barnes among two protocols,
MOESI_SMP_Directory from GEMS distribution and modified version, I see
a lot more supervisor misses in the modified version. I think these
two issues should be related. Any insight input on this? Also,
appreciate any pointers to look in the protocol.

/edwl

On Thu, Mar 4, 2010 at 11:32 PM, Edward Lee <edwl202@xxxxxxxxx> wrote:
> Hi,
>
> I modified the MOESI_SMP_directory protocol for some optimizations. I
> manage to run small traces that I generate successfully with Ruby's
> tester. Also, new protocol works fine while running Splash2 benchmarks
> (at least they finish successfully although I am not sure if this
> means anything). However, when I try to run tester like this:
>
> ./amd64-linux/generated/MOESI_SMP_Adaptive/bin/tester.exec -p 16 -a 1
> -l 1000 -v med -c neStlc
>
> I am getting the following error:
>
> ---------------------------------------------
> Debug: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:196: Check callback
> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:200: proc is 10
> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:200: proc is 10
> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:201: address is [0x38ec, line 0x38c0]
> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:201: address is [0x38ec, line 0x38c0]
> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:202: data is [[0x38ec, line 0x38c0], 4, [ 0 0 0 0 ]]
> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:202: data is [[0x38ec, line 0x38c0], 4, [ 0 0 0 0 ]]
> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:203: byte_number is 0
> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:203: byte_number is 0
> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:204: (int)m_value+byte_number is 215
> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:204: (int)m_value+byte_number is 215
> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:205: (int)data.getByte(byte_number) is 0
> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:205: (int)data.getByte(byte_number) is 0
> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:206: *this is [[0x38ec, line 0x38c0], value: 215,
> status: Check_Pending, initiating node: 10, store_count: 4]
> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:206: *this is [[0x38ec, line 0x38c0], value: 215,
> status: Check_Pending, initiating node: 10, store_count: 4]
> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:207: g_eventQueue_ptr->getTime() is 2020
> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:207: g_eventQueue_ptr->getTime() is 2020
> Fatal Error: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:208: Action/check failure
> Fatal Error: in fn void Check::performCallback(NodeID, SubBlock&) in
> tester/Check.C:208: Action/check failure
> Aborted
> --------------------------------------------------
>
> So, the problem seems to be in the stored value. Since m_status =
> "Check_pending" it goes and checks this in tester/Check.C
>
> if (uint8(m_value+byte_number) != data.getByte(byte_number) &&
> (DATA_BLOCK == true)
>
> and since these [ (m_value+byte_number) !=data.getByte(byte_number) ]
> values are not equal, tester is failing.
>
> I saw earlier posts from Luke that this might be because of complex
> PUTX races and this was encountered earlier in the
> MESI_CMP_filter_directory protocol. As I am not an SLICC expert, I am
> stuck here and wondering if anybody can help me how to proceed from
> here. What would an action plan to possibly narrow down this protocol
> bug?
>
> Any help is highly appreciated.
>
> Regards,
>
> Ed
>
[← Prev in Thread] Current Thread [Next in Thread→]