Matt,
We generally do the following for studying TM workloads
- Create processor sets with a 1-1 mapping between processors and sets.
- After threads are created, they are first bound to one of these
processor sets.
- They then synchronize on a barrier spinning for all the threads to be
scheduled at least once
and bound to their respective processor set.
- Start simulating with ruby after the barrier is crossed by 'a'
particular thread.
By the end of these steps, all the threads should have a unique
processor to be scheduled on. Hopefully
they are also scheduled and spinning on the barrier/crossed the barrier
when we start timing simulation.
In our experience kmeans has a very low transactional duty cycle. So
even with transactional threads executing
simultaneously, you would see only one thread in transactions at a time.
You might want to contact the authors
of STAMP regarding the low duty cycle. Hope this helps.
Jayaram
Matthew James Horsnell wrote:
Dan/Derek,
Thanks for your reply. I guess I probably didn't phrase my question well
enough. I appreciate that no performance results in LogTM were presented
using the tourmaline module. However, my question is equally applicable
to the Gems-Ruby modules; do you in any way alter the OS thread
scheduling policy in Solaris?. Using tourmaline, and although the ruby
modules will add timing delays, I see very infrequent overlapping of
transactional threads, mainly do to the influence that the OS scheduling
policy appears to have on the way the transactional threads are
scheduled. For example on a 4 processor system, I rarely see all
processors executing transactionally, and for long periods of time, only
one processor is executing a transactional thread.
Perhaps I am misunderstanding something, but currently with the OS
scheduling the transactional threads (in my case running the STAMP
kmeans benchmark) I hardly see any concurrency, not due to aborted
transactions but rather because the operating system is rarely choosing
to scheduling multiple transactional threads concurrently.
Any more information would be gratefully received,
Matt
Derek Hower wrote:
Reply from Dan Gibson:
Tourmaline is a functional transactional memory simulator, which is
intended to have extensible behavior for future expansion. Its
default behavior makes no attempt to model a realistic timing nor
interleaving of transactions -- it simply provides the bare minimum
implentation of atomicity in the most trivial, simulator-magic way
possible -- by literally disabling all (other) processors.
To my knowledge, the released version of tourmaline has never been
used to collect viable research data -- it is a tool intended to
enable warm-up of transactional applications, as well as to
facilitate debugging of transactional applications seperate from the
debugging of the timing simulator.
If you are interested in looking at running concurrent threads, you
should look into implementing sub-classes of TransactionController,
which Tourmaline uses to guarantee atomicity of transactions. There
is a how-to guide in the README for tourmaline, called the
'Transaction Controller Cookbook'.
Please note that the /timing/ runs for the LogTM family of work all
used Ruby, not Tourmaline. However, tourmaline is a viable tool for
(much) longer simulations at the cost of some timing fidelity.
Regards,
Dan
On Sep 28, 2007, at 3:37 AM, horsnelm@xxxxxxxxxxxx wrote:
Hi,
I've been looking at the code inside the tourmaline TM module of
the gems
package. I'm trying to run some benchmarks, in particular the
stamp-0.9.4
benchmarks, and wondered if you could comment on the scheduling
policy and
how you have used tourmaline, or gems in general, to generate your
results.
I can see in the tourmaline code, that when you begin a transaction
you
disable interrupts in the processor registers, which means that
until the
transaction resolves it cannot be interrupted. You switch back on
interrupts when the transaction commits or aborts.
Is it not the case that the operating system threads will
interleave with
the transactions, competing for the cpu time? Do you prevent this from
happening by changing the scheduling policy in the OS, or do you
measure
your results in some other manner? The reason I ask, is that when
running
say a 4 threaded application, on a 4 cpu architecture, transactions
infrequently overlap as they are scheduled according to the OS.
Ideally
I'd like the transactional threads to run as concurrently as
possible to
look at the interactions between them.
Thanks,
Matt Horsnell
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
"site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
|