Re: [Gems-users] Ruby Segmentation Fault

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Date:	Mon, 09 Feb 2009 21:52:00 +0200
From:	Konstantinos Nikas <knikas@xxxxxxxxxxxxxxxxx>
Subject:	Re: [Gems-users] Ruby Segmentation Fault

Hi Jayaram,

I have sent an email with the output file, but it is too big probably togo through and waits for a moderator's confirmation. I could upload itsomewhere else if you like to avoid sending it to everyone on the list.

In the meantime we are trying to find the problem ourselves and we arestuck in the following case. We have thread 0 in proc 1 starting aCLOSED transaction and the output is :

41181831 1 [1,0] ADD XACT FRAME oldLogFramePointer: [0x2d9020, line0x2d9000] newLogFramePointer: [0x2d9020, line 0x2d9000] 141181831 1 [1,0] BEGIN XACT: TID 0 XID 10 XACT_LEVEL: 1 PC: [0x137fc,line 0x137c0]

If we understand Ruby's code correctly, at this point theTransactionVersionManager will call beginTransaction, which will calltakeCheckpoint, which will execute


m_registers[thread=0][transactionLevel-1 = 0]->takeCheckpoint()

Later this thread decides to abort :

41183071 1 [1,0] SETTING ABORT FLAG ADDR = [0x38002218, line0x38002200] PC = [0x13880, line 0x13880] NPC = [0x13884, line 0x13880]41183074 1 [1,0] ISOLATE XACT STORE [0x3b7e3740, line 0x3b7e3740] XACTLEVEL: 1 PC = [0x13880, line 0x13880]41183077 2 [2,0] ISOLATE XACT STORE [0x38002200, line 0x38002200] XACTLEVEL: 1 PC = [0x12dcc, line 0x12dc0]41183077 2 [2,0] LOGGING STORE: [0x2ae200, line 0x2ae200] 1 PC =[0x12dcc, line 0x12dc0]

**** Log. proc. num: 2:  m_logSize: 1632 m_maxLogSize: 781

41183077 2 [2,0] ADD UNDO LOG ENTRY: [0x2ae200, line 0x2ae200][0x38002200, line 0x38002200] LogAddress: [0x3a163c, line 0x3a1600] 141183082 2 [2,0] ISOLATE XACT STORE [0x38002200, line 0x38002200] XACTLEVEL: 1 PC = [0x12dd0, line 0x12dc0]41183082 2 [2,0] LOGGING STORE: [0x2ae200, line 0x2ae200] 0 PC =[0x12dd0, line 0x12dc0]41183091 2 [2,0] ISOLATE XACT LOAD VA: [0xfeffbec0, line 0xfeffbec0] PA:[0x3c543ec0, line 0x3c543ec0] XACT LEVEL: 1 PC = [0x13244, line 0x13240]41183091 1 [1,0] TRAP TO HANDLER: TID: 0 TRAP_TYPE 1 TRAP ADDRESS0x38002218 NUM_RETRIES 0 LOG_SIZE 1360 XACT_LEVEL 1XACT_LOWEST_CONFLICT_LEVEL 1 Handler Address = [0x1b39c, line

0x1b380] PC = [0x100707c, line 0x1007040]

41183091 1 [1,0] Begin ESCAPE ACTION - ESCAPE DEPTH: 1 PC [0x100707c,line 0x1007040]

Begin exposed action for thread 0 of proc 1 PC [0x1b39c, line 0x1b380]

41183092 1 [1,0] Begin ESCAPE ACTION - ESCAPE DEPTH: 2 PC [0x1b39c,line 0x1b380]


which will release isolation accordingly and restart the transaction.

End exposed action for thread 0 of proc 1 PC [0x1b3dc, line 0x1b3c0]

41194048 1 [1,0] END ESCAPE ACTION - ESCAPE DEPTH: 1 PC [0x1b3dc, line0x1b3c0]

Restart transaction for thread 0 of proc 1
restartTransactionCallback proc = 1 thread = 0 time = 41194049

41194049 1 [1,0] END ESCAPE ACTION - ESCAPE DEPTH: 0 PC [0x1b3e4, line0x1b3c0]

1 [1,0] TID 0 RESTART TRANSACTION AT XACT LEVEL: 1 LOG_SIZE: 1360
Segmentation fault (SIGSEGV) in main thread

So, according to the debug output, thread 0 will restart its transactionand the new xact level is 1. SoTransactionInterfaceManager:restartTransactionCallback executes:


getXactVersionManager()->restartTransaction(thread = 0, new_xact_level=1)

which will go and call :

m_registers[0][1]->restoreCheckpoint()

which causes the SEG FAULT, because the original transaction took thecheckpoint for m_registers[0][0]!

It seems too elementary to be a real bug, so I guess we are missingsomething in the code.


Kind regards,

Kostis

The segmentation fault seems to occur since ruby does not find the register
checkpoint for the processor that is trying to restart its transaction...

#0  RegisterState::restoreCheckpoint (this=0x0, m_proc=1) at
    /home/users/anastop/gems/gems-2.1//common/Vector.h:92
    #1  0x00002aaab066bc5d in
    TransactionVersionManager::restartTransaction
    (this=0xa341340, thread=0, xact_level=1) at

Can get more debug output by setting XACT_DEBUG and XACT_DEBUG_LEVEL?


Jayaram


Konstantinos Nikas wrote:

The code we are running is a transactional workload that we havedeveloped and we set it up according to the directions provided in thewiki (bind threads, call set_transaction_registers, etc).

The protocol is MESI_CMP_filter_directory as it is the only one LogTMcan use (at least in the latest version of GEMS).


Kind regards,

Kostis

What benchmark are you running and what protocol?

Polina

On Thu, Feb 5, 2009 at 12:47 PM, Konstantinos Nikas<knikas@xxxxxxxxxxxxxxxxx <mailto:knikas@xxxxxxxxxxxxxxxxx>> wrote:


    Hi all,

    we have an 8-core CMP and a transactional workload which only uses 2
    threads. We bind the 2 threads to 2 specific processors (avoiding
    always
    core 0). When we set XACT_LOG_BUFFER_SIZE=2048 everything works fine.
    For smaller values (0, 256, 1024) though the simulation fails.

    At first we used to get the following warning messages :

    45936462 2 [2,0] endEscapeAction WARNING escape depth < 1. Depth = 0

    Searching the mailing list we came across a post which suggested
    adding
    a beginEscapeAction() call into hardwareAbort(). We included this
    in our
    code and the warning messages went away. However, the simulations
    still
    fail with a segmentation fault. Gdb reported the following :

    #0  RegisterState::restoreCheckpoint (this=0x0, m_proc=1) at
    /home/users/anastop/gems/gems-2.1//common/Vector.h:92
    #1  0x00002aaab066bc5d in
    TransactionVersionManager::restartTransaction
    (this=0xa341340, thread=0, xact_level=1) at
    /home/users/anastop/gems/gems-2.1//common/Vector.h:109
    #2  0x00002aaab0656b89 in
    TransactionInterfaceManager::restartTransactionCallback
    (this=0xa341230,
    thread=0) at log_tm/TransactionInterfaceManager.C:751
    #3  0x00002aaaad20fb70 in ?? () from
    /home/simics/academic/simics-3.0.31/amd64-linux/lib/sparc-u3.so
    #4  0x00002aaaad1aed99 in ?? () from
    /home/simics/academic/simics-3.0.31/amd64-linux/lib/sparc-u3.so
    #5  0x00002aaaad1aec9a in ?? () from
    /home/simics/academic/simics-3.0.31/amd64-linux/lib/sparc-u3.so
    #6  0x00002b1b49bc2eaf in SIM_continue () from
    /home/simics/academic/simics-3.0.31/amd64-linux/bin/libsimics-common.so
    #7  0x00002b1b49b83a9c in ?? () from
    /home/simics/academic/simics-3.0.31/amd64-linux/bin/libsimics-common.so
    #8  0x00002b1b4aaf739c in PyCFunction_Call (func=0x2aaaaab26560,
    arg=0x2aaaac9f6a50, kw=0x0) at /home/packages/python-2.4.2 .......

    Any ideas? Or suggestions how to debug more efficiently?

    Kind regards,

    Kostis

    PS: A similar situation occurs when we run the same 2 threads on a
    4-core machine. It works fine for XACT_LOG_BUFFER_SIZE=0,256,1024,2048
    and fails for size=32!

    _______________________________________________
    Gems-users mailing list
    Gems-users@xxxxxxxxxxx <mailto:Gems-users@xxxxxxxxxxx>
    https://lists.cs.wisc.edu/mailman/listinfo/gems-users
    Use Google to search the GEMS Users mailing list by adding
    "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.


------------------------------------------------------------------------

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.

[← Prev in Thread]	Current Thread	[Next in Thread→]
[Gems-users] Ruby Segmentation Fault, Konstantinos Nikas [Gems-users] NETWORK_LINK_LATENCY parameter, Marco Solinas Re: [Gems-users] Ruby Segmentation Fault, Polina Dudnik Re: [Gems-users] Ruby Segmentation Fault, Konstantinos Nikas Re: [Gems-users] Ruby Segmentation Fault, Jayaram Bobba Re: [Gems-users] Ruby Segmentation Fault, Konstantinos Nikas <= Re: [Gems-users] Ruby Segmentation Fault, Konstantinos Nikas Re: [Gems-users] Ruby Segmentation Fault, Jayaram Bobba Re: [Gems-users] Ruby Segmentation Fault, Konstantinos Nikas Re: [Gems-users] Ruby Segmentation Fault, Jayaram Bobba Re: [Gems-users] Ruby Segmentation Fault, Konstantinos Nikas Re: [Gems-users] Ruby Segmentation Fault, Jayaram Bobba Re: [Gems-users] Ruby Segmentation Fault, Konstantinos Nikas Re: [Gems-users] Ruby Segmentation Fault, Konstantinos Nikas Re: [Gems-users] Ruby Segmentation Fault, Jayaram Bobba Re: [Gems-users] Ruby Segmentation Fault, Konstantinos Nikas

Previous by Date:	[Gems-users] Pseudo LRU Algorithm in GEMS, rntbdm rntbdm
Next by Date:	Re: [Gems-users] Ruby Segmentation Fault, Konstantinos Nikas
Previous by Thread:	Re: [Gems-users] Ruby Segmentation Fault, Jayaram Bobba
Next by Thread:	Re: [Gems-users] Ruby Segmentation Fault, Konstantinos Nikas
Indexes:	[Date] [Thread]

Mailing List Archives

Authenticated access

Re: [Gems-users] Ruby Segmentation Fault