Re: [Gems-users] Debugging SMP protocol with tester.exec


Date: Fri, 12 Mar 2010 15:20:20 -0500
From: Edward Lee <edwl202@xxxxxxxxx>
Subject: Re: [Gems-users] Debugging SMP protocol with tester.exec
I am narrowing down my question to get some answers/feedback this time :)

So, here is my question:

If I generate a trace out of the test run that failed and re-run the
tester with "-z trace_file" instead of "-l 1000", it finishes
successfully. I checked the printed parameters in the output for DATA
_BLOCK and that's true for both cases.

How is this possible? If somebody can help with this I really appreciate it.

Thanks,

Ed

On Sun, Mar 7, 2010 at 11:32 AM, Edward Lee <edwl202@xxxxxxxxx> wrote:
> When I compare the results of Barnes among two protocols,
> MOESI_SMP_Directory from GEMS distribution and modified version, I see
> a lot more supervisor misses in the modified version. I think these
> two issues should be related. Any insight input on this? Also,
> appreciate any pointers to look in the protocol.
>
> /edwl
>
> On Thu, Mar 4, 2010 at 11:32 PM, Edward Lee <edwl202@xxxxxxxxx> wrote:
>> Hi,
>>
>> I modified the MOESI_SMP_directory protocol for some optimizations. I
>> manage to run small traces that I generate successfully with Ruby's
>> tester. Also, new protocol works fine while running Splash2 benchmarks
>> (at least they finish successfully although I am not sure if this
>> means anything). However, when I try to run tester like this:
>>
>> ./amd64-linux/generated/MOESI_SMP_Adaptive/bin/tester.exec -p 16 -a 1
>> -l 1000 -v med -c neStlc
>>
>> I am getting the following error:
>>
>> ---------------------------------------------
>> Debug: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:196: Check callback
>> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:200: proc is 10
>> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:200: proc is 10
>> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:201: address is [0x38ec, line 0x38c0]
>> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:201: address is [0x38ec, line 0x38c0]
>> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:202: data is [[0x38ec, line 0x38c0], 4, [ 0 0 0 0 ]]
>> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:202: data is [[0x38ec, line 0x38c0], 4, [ 0 0 0 0 ]]
>> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:203: byte_number is 0
>> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:203: byte_number is 0
>> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:204: (int)m_value+byte_number is 215
>> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:204: (int)m_value+byte_number is 215
>> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:205: (int)data.getByte(byte_number) is 0
>> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:205: (int)data.getByte(byte_number) is 0
>> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:206: *this is [[0x38ec, line 0x38c0], value: 215,
>> status: Check_Pending, initiating node: 10, store_count: 4]
>> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:206: *this is [[0x38ec, line 0x38c0], value: 215,
>> status: Check_Pending, initiating node: 10, store_count: 4]
>> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:207: g_eventQueue_ptr->getTime() is 2020
>> Warning: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:207: g_eventQueue_ptr->getTime() is 2020
>> Fatal Error: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:208: Action/check failure
>> Fatal Error: in fn void Check::performCallback(NodeID, SubBlock&) in
>> tester/Check.C:208: Action/check failure
>> Aborted
>> --------------------------------------------------
>>
>> So, the problem seems to be in the stored value. Since m_status =
>> "Check_pending" it goes and checks this in tester/Check.C
>>
>> if (uint8(m_value+byte_number) != data.getByte(byte_number) &&
>> (DATA_BLOCK == true)
>>
>> and since these [ (m_value+byte_number) !=data.getByte(byte_number) ]
>> values are not equal, tester is failing.
>>
>> I saw earlier posts from Luke that this might be because of complex
>> PUTX races and this was encountered earlier in the
>> MESI_CMP_filter_directory protocol. As I am not an SLICC expert, I am
>> stuck here and wondering if anybody can help me how to proceed from
>> here. What would an action plan to possibly narrow down this protocol
>> bug?
>>
>> Any help is highly appreciated.
>>
>> Regards,
>>
>> Ed
>>
>
[← Prev in Thread] Current Thread [Next in Thread→]