Re: [Gems-users] LogTM Transactions Hanging (Gems 2.1)


Date: Wed, 25 Jun 2008 13:58:07 +1200
From: "Fuad Tabba" <fuad@xxxxxxxxxxxxxxxxx>
Subject: Re: [Gems-users] LogTM Transactions Hanging (Gems 2.1)
One more observation; I found that using processor 0 makes the system a bit flaky (lots of exceptions, sometimes gets completely stuck in exceptions). What I would normally do (for example, on a 4 processor image) is that I would use processors 0,1,2. But from the tests I've been running, that can be hazardous to the health of the system. Instead I'm using processors 1,2,3 while leaving 0 free. (This seems to apply whether I'm using processor_bind or pset_bind)

On Mon, Jun 23, 2008 at 4:46 PM, Fuad Tabba <fuad@xxxxxxxxxxxxxxxxx> wrote:
Thanks Jayaram (and everyone else who contributed). I think that I finally got LogTM (gems 2.1) to work with my benchmarks. I'm posting this on the Gems list for the benefit of anyone who might search for this, or a for similar problem in the future.

Problem summary: Simics/Gems would behave funny (i.e. crash, get stuck in exceptions) when using LogTM and having the benchmark binaries compiled using Sun Studio compiler instead of GCC.

Solution: In my case at least, there were several causes that I finally managed to identify for this problem. Some of them make sense, others were found through trial and error don't make much sense, but seem to work anyway.


Problem 1 (fix the offset of the transaction_manager_stub function):-

The first problem had to do with how Sun C compiler inlines some functions, which affects the offsets of certain things that LogTM is looking for. To quote Jayaram:-


"I have taken a look at your executable. If you disassemble it and look
at transaction_manager_stub(), you will notice that the BEGIN_ESCAPE instruction
turns out to instruction number 12 at byte offset 44 from the beginning of the function.
This is the location that the simulator should jump to on a trap...
The current code sets this to byte offset 16, since I believe that gcc does not inline
the delay() function.

void set_transaction_registers(int threadid){         delay(1024);
        void *handler_address = &transaction_manager_stub + 16;

In your case, it should be changed to &transaction_manager_stub+44.
Its unfortunate that we did not insert a big fat comment at this point in the code explaining
the significance of 16. I will fix that in the next release."

So in my case I changed line 556 of transaction.c (in function set_transaction_registers) to:-
void *handler_address = &transaction_manager_stub + 44;

The compiler I'm using is "Sun C 5.9 SunOS_sparc Patch 124867-01 2007/07/12", so if you're using any other compiler/version it might be a good idea to disassemble your binary and see where BEGIN_ESCAPE (sethi %hi(0x1800), %g0) is and adjust that accordingly.


Problem 2:-

I used to code the spawning of N threads as follows (where N is one less than the total number of processors):-

        for (i = 1; i < N; i++)
        { pthread_create(/* parameters[i] */, function_to_spawn); }
        function_to_spawn(parameters[0]);
        for (i = 1; i < threadGlobalvars.noThreads; i++)
        { pthread_join(/**/); }

That way I would save myself from spawning a new thread for thread 0, just reuse the current one. This used to work with the old gems/logtm, and still works with ATMTP. But it doesn't seem to work with the new LogTM. So better stick with the tried and true method of:-

        for (i = 0; i < N; i++)
        { pthread_create(/* parameters[i] */, function_to_spawn); }
        for (i = 0; i < N; i++)
        { pthread_join(/**/); }


Problem 3:-

To perform a ruby break (magic break), I used to have my own barrier code, followed by the magic break instruction (that would only run on thread/processor 0). I never called SIMICS_BEGIN_BARRIER or SIMICS_END_BARRIER. As far as I could tell by looking at commands.C, these two magic instructions have to do with visualization. But for some reason, if I do not structure my breaks/barriers exactly the same way that Barrier_breaking in transaction.c does, then it doesn't work. Again, my old method works fine with ATMTP and the old LogTM, but this seems to have some sort of weird effect on the new LogTM.


Assorted things that helped:-

- Performing i/o (e.g printf) on any of the bound processors, close enough to performing transactions (not saying actually in a transaction), seems to make things act up for some reason.

- I found that disabling interrupts on the running processors helps (using psradm -i). So before running the benchmarks, I would run:-
e.g. (on a 4 processor image) psradm -i 0 1 2

Make sure to leave at least one processor with interrupts enabled though, it's very important.

- I couldn't find a difference between using processor_bind or pset_bind. Looking at the man pages they seem to do the same thing. I prefer processor_bind because it doesn't require root access, so I can use it on the real sparc machine I run the software benchmarks on without having to go through the hassle of asking for the right privileges. Could someone enlighten me if for some reason pset_bind is preferable?

Thanks again to everyone who helped! :)

Cheers,
/Fuad

[← Prev in Thread] Current Thread [Next in Thread→]