Re: [Gems-users] Processor Lost


Date: Thu, 3 May 2007 12:06:38 -0400 (EDT)
From: Daniel Nussbaum - Sun Microsystems - Burlington United States <dan.nussbaum@xxxxxxx>
Subject: Re: [Gems-users] Processor Lost
>
> Date: Thu, 03 May 2007 08:28:33 -0700 (PDT)
> From: James Wang <jameswang99@xxxxxxxxx>
>
> Hi Dan:
>
>     Thanks a lot. I think one of the reasons that my benchmark
>     invokes OS is that it does quiet a bit IO which actually could
>     be avoided.
>

Ugh.  If you're doing I/O inside a transaction, that would be a pretty
bad error, which LogTM doesn't prevent you from doing.  In fact, LogTM
doesn't prevent you from doing any system calls of any kind from
within a transaction, which is also almost certainly an error.  (If
you wish, I can probably send you some code that hacks the trap
instruction to cause a *simulator* assertion violation if you ever
make any system call inside a transaction.  Tracking that down has
cost me the better part of a week on two occasions in the past year or
so.)

Even if the I/O is outside a transaction, it should probably be
avoided.  Screen I/O can definitely produce interrupts long after the
program returns from the I/O call, in which the characters are
actually sent to the screen device.  These interrupts can preempt your
program for what can seem to be a *long* time.

Better to just avoid it -- buffer it up and do all the I/O after the
transactional part of the test has finished (and after you've taken
any timing numbers you need).

>
>     But I don't think processor_bind is just a suggestion.
>

That's my understanding as well.  Note that I've been told (by
somebody who almost certainly knows) that for real applications,
binding threads to processors is a very bad idea -- it can cause the
scheduler to do really bad things.  For LogTM runs, doing so is
absolutely necessary, however, because migrating a thread that's
currently inside a transaction would lead to utter chaos (it is my
understanding that LogTM is not at all equiped to correctly handle
such a thing).

dann

>
>    It is mandatory for User threads but maybe a suggestion for OS
>    related thread.
>
> Regards
> James
>
> ----- Original Message ----
> From: Dan Gibson <degibson@xxxxxxxx>
> To: Gems Users <gems-users@xxxxxxxxxxx>
> Sent: Friday, May 4, 2007 3:16:25 AM
> Subject: Re: [Gems-users] Processor Lost
>
> As a corollary to this discussion, I'd like to add that this kind of
> issue is one of the reasons that full-system simulation is a great
> thing -- this sort of thing can happen to a real workload, too.
>
> James Wang wrote:
> > Hi Dan:
>
> >     Thank you very much for your prompt reply. But I don't really
> >     understand what the nature of this situation is. Why would the
> >     OS want to deschedule my benchmark?
>
> The OS deschedules threads for a variety of reasons. Interrupts of
> any kind typically need to be handled, and if your benchmark
> initiates a blocking I/O operation the OS will probably deschedule
> your benchmark while the I/O completes (including, say, page
> faults). Moreover, there is always the real-time interrupt timer
> that the OS uses for timesharing anyway -- this timer can actually
> be a problem when running with Simics, as a common trick is to set
> Simics's clock frequency low to improve I/O performance -- with low
> clock frequencies, timer interrupts happen more often relative to
> higher Simics frequencies. For other reasons, have a look at your
> favourite OS textbook.
>
> > Also, I bound the thread to the processor, should it just stay there and run?
>
> See Kevin's response. In general, processor_bind() is a suggestion,
> not a command, to the OS.
>
> >
> > I did this with a four processor simulated machine, why other
> > processors are not affected by this problem?
> >
>
> Solaris seems to favor P0 for a variety of reasons, chief among them
> simplicity.
>
> >
> > Regards
> > James
> >
>
> Regards,
> Dan
>
> > ----- Original Message ----
> > From: Dan Gibson <degibson@xxxxxxxx>
> > To: Gems Users <gems-users@xxxxxxxxxxx>
> > Sent: Friday, May 4, 2007 1:03:42 AM
> > Subject: Re: [Gems-users] Processor Lost
> >
> > The OS could be descheduling your transactional benchmark, though
> > I'm not sure why that might be happening. Try quiescing the system
> > by killing background processes, and then pre-fetch any binaries
> > or data you might be using in your benchmark by running it once to
> > completion before loading Ruby (and hence, without
> > synchronization). That should eliminate any I/O you might
> > inadvertenly cause at runtime. It will also cause your benchmark
> > to run with different system interactions, and will hopefully fix
> > the Processor Lost/Processor Found problem.
> >
> > Regards,
> > Dan
> >
> > James Wang wrote:
> >
> >>
> >> Hi All:
> >>
> >>    I am running some transactional memory benchmark using a
> >>    customized SMP cache coherent protocol. For some reason, p0
> >>    will run code other than the transactional benchmark and the
> >>    other processors finishes fine. I cannot really tell what p0
> >>    is doing. I tried a few different random seed, the same
> >>    situation happens every time.
> >>
> >>    Any idea?
> >>    Thanks for any reply in advance.
> >>
> >> Regards
> >> James
> >>
> >
>
[← Prev in Thread] Current Thread [Next in Thread→]