Re: [Gems-users] Ruby Segmentation Fault


Date: Wed, 18 Feb 2009 12:14:19 -0600
From: Jayaram Bobba <bobba@xxxxxxxxxxx>
Subject: Re: [Gems-users] Ruby Segmentation Fault
Great. Thanks for keeping the list posted on this issue.

Jayaram

Konstantinos Nikas wrote:
Hi Jayaram,

thanks for that info.

Actually, I found that it is not a problem with the size of the occupy_stack but that the compiler sees that occupy_stack is not used and eliminates it. This happens for both gcc and Sun Studio's C compilers for O2 (and higher) optimization levels. So the solution is either to specify lower levels or to make sure the elimination doesn't happen. We opted for the second option, so we pass the occupy_stack's address as input to an external dummy function (which does nothing). In this case the compiler creates a big stack for tm_trap_handler and everything seems to work fine (at least the configuration that used to crash, now finishes correctly :-) ).

Kind regards,

Kostis
Konstis,

Great job on tracking down the bug. There is a dummy 'occupy_stack' variable in the software handlers that is supposed to ensure that the software handler runs way up in the stack and reduce the chance of it interfering with transactional execution. Unfortunately in this case the buffer provided
by occupy_stack doesn't seem to be sufficient. Increasing it could work...

Ideally, the handlers will run off their own stack but we haven't implemented that.

Jayaram

Konstantinos Nikas wrote:
Hi ,

I think I found out now what is happening (although I have no clue if it is "normal"). Thread 0 starts a transaction and I see the following :

41510444 1 [1,0] ISOLATE XACT STORE [0x1f247ec0, line 0x1f247ec0] XACT LEVEL: 1 PC = [0x13228, line 0x13200] 41510444 1 [1,0] LOGGING STORE: [0xff0fbec0, line 0xff0fbec0] 1 PC = [0x13228, line 0x13200] 41510444 1 [1,0] ADD UNDO LOG ENTRY: [0xff0fbec0, line 0xff0fbec0] [0x1f247ec0, line 0x1f247ec0] LogAddress: [0x2d9174, line 0x2d9140] 1

The transaction moves on and some point it needs to abort. The software handler kicks in to unroll the log and undoes the log entries which include the 64 bytes that start at 0xFF0FBEC0.

However, (for some reason), during this invocation of the software handler %fp + 0x44 = 0xFF0FBECC, which is used to store the value needed to access the right threadTransContext structure. When the line is restored in the tm_unroll_log_entry, this value is lost and the software handler saves the new xact_level in the wrong location.

In the previous invocations of the software handler, %fp+0x44 = 0xFF0FBDFC. This means, that the handler stores the new values of xact_level and xact_log_size in the right location as the memory line is not undone and the transaction can be correctly restarted.

Obviously, there shouldn't be any conflicts between the addresses used by the software handler and those included inside a transaction. I have followed all the instructions for preparing the workloads and hopefully I haven't missed anything.

Any workarounds?

Kind regards,

Kostis
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.


[← Prev in Thread] Current Thread [Next in Thread→]