Re: [Gems-users] MESI_CMP_filter_directory

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Date:	Tue, 15 Jan 2008 13:42:54 -0600
From:	Dan Gibson <degibson@xxxxxxxx>
Subject:	Re: [Gems-users] MESI_CMP_filter_directory - race with PUTX

That is quite the complex race indeed (nice picture). While I can'tcomment on MESI_CMP_filter_directory itself and whether this race isconsidered, I can speculate why it wouldn't arise without yourmodification. Baseline Simics+Ruby models an SC in-order uniprocessorthat blocks on any coherence request until completion.

That is, ruby's isReady() might end up stalling the L1cache-1's Storerequest until the writeback is complete. However, with 'inserted'invalidations (I'm not 100% how you're inserting the invalidations)there is a chance that isReady() wouldn't stall the request. At anyrate, I suppose I'm suggesting a protocol-independent fix for you --have Ruby's isReady() function return false if a writeback isoutstanding for the cache block.


Regards,
Dan

Rubén Titos wrote:

Dear list,
I've been working for some time now with MESI_CMP_filter_directory inGEMS 2.0, and so far I hadn't run into any protocol bug. However,after including some cache flushing functionality (that barelymodified the L1 protocol file) I've come across the following racecondition, when debugging my own version the tester andrandomization=true. I believe this mistake is independient from mycode, as I haven't changed any protocol state, event nor transition,but just added an extra event and some minor additions to the codein_port(mandatoryQueue_in).
In an attempt for clarity, I'll avoid pasting the debug trace, butinstead I'm going to describe the race condition (btw I've also drawnit, you can have a look herehttp://skywalker.inf.um.es/~rtitos/files/protocol_race.png<http://skywalker.inf.um.es/%7Ertitos/files/protocol_race.png> )- It is initiated by the replacement of the line (state M) atL1cache-1 (in my case when flushing the cache, but it may as wellhappen in the original version of such protocol with any commonreplacement). The PUTX msg sent by L1cache-1 is delayed many cyclesduring its trip to the L2, and in the meantime the following happens:--- L1cache-0 issues a GETS, that arrives at L2 and is forwarded tothe L1cache-1, finding the line in state I_M (since it's being replaced).--- L1cache-1 send the data (dataSfromL1) to the requestor (L1cache-0)and a writeback to the L2, and sets the line is state I.
--- The L1cache-0 receives the data and sends an unblock to the L2.
--- The L1cache-1 receives a Store request, and since the block is instate I it issues a new GETX message, setting the block to IM state.--- The unblock from core 0 arrives at L2 and updates the directory(but it doesn't remove the core 1 from the sharer list).- Finally, the L1_PUTX arrives at L2. Since the L1cache-1 is still asharer, is not marked at L1_PUT_old, but instead is recycled (the lineis blocked at L2 (MT_IB) waiting for a WB_Data from L1cache-1).
- The GETX from L1cache-1 arrives at L2 and it's also recycled.
- Both GETX and PUTX from L1cache-1 are recycled a few times.
- The WB_data from core 1 arrives, and the line is unblocked.
- In this moment, the GETX is the next msg in the recycled queue(because PUTX was considered right before the WB_data arrived), soit's processed and the line blocked again, from SS to SS_MB state. TheL2 send INV to L1cache-0 and L2Data to L1cache-1.
- The PUTX is recycled over and over again since the line is blocked.
- L1cache-1 receives L2Data, transits from IM to IM_M, waiting for theack from L1cache-0.
- L1cache-0 receives Inv and sends Ack to L1cache-1.
- L1cache-1 receives Ack, changes from IM_M to M (resolves the storemiss) and send Exclusive_Unblock to L2.- When the L2 receives the Exclusive_Unblock, changes the state fromSS_MB to MT.- In this moment, the line is not blocked so the PUTX is processed anda WB_Ack is sent to the L1cache-1.- CRASH: Upon reception, the line is in state M and it shouldn'treceive any WB_Ack msg in this state (no valid transition).
Can anyone confirm this protocol bug? Or am I missing some subtle butimportant step? Again, my code didn't touch any of the "basic protocolstuff", but what confuses me is that the randomized tester can't makethe original protocol break, but breaks my own version.
Cheers,

Rubén

--
Ruben Titos
Parallel Computing and Architecture Group (GACOP)
Computer Engineering Dept.
University of Murcia
------------------------------------------------------------------------

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.


--
http://www.cs.wisc.edu/~gibson [esc]:wq!

[← Prev in Thread]	Current Thread	[Next in Thread→]
[Gems-users] MESI_CMP_filter_directory - race with PUTX, Rubén Titos Re: [Gems-users] MESI_CMP_filter_directory - race with PUTX, Dan Gibson <= Message not available Re: [Gems-users] MESI_CMP_filter_directory - race with PUTX, Rubén Titos Re: [Gems-users] MESI_CMP_filter_directory - race with PUTX, Rubén Titos

Previous by Date:	[Gems-users] MESI_CMP_filter_directory - race with PUTX, Rubén Titos
Next by Date:	Re: [Gems-users] how to access ruby cycle info from other simics module, soohong p kim
Previous by Thread:	[Gems-users] MESI_CMP_filter_directory - race with PUTX, Rubén Titos
Next by Thread:	Re: [Gems-users] MESI_CMP_filter_directory - race with PUTX, Rubén Titos
Indexes:	[Date] [Thread]

Mailing List Archives

Authenticated access

Re: [Gems-users] MESI_CMP_filter_directory - race with PUTX