Date: | Tue, 15 Jan 2008 19:02:57 +0100 |
---|---|
From: | "Rubén Titos" <rtitos@xxxxxxxxxxx> |
Subject: | [Gems-users] MESI_CMP_filter_directory - race with PUTX |
Dear list, I've been working for some time now with MESI_CMP_filter_directory in GEMS 2.0, and so far I hadn't run into any protocol bug. However, after including some cache flushing functionality (that barely modified the L1 protocol file) I've come across the following race condition, when debugging my own version the tester and randomization=true. I believe this mistake is independient from my code, as I haven't changed any protocol state, event nor transition, but just added an extra event and some minor additions to the code in_port(mandatoryQueue_in). In an attempt for clarity, I'll avoid pasting the debug trace, but instead I'm going to describe the race condition (btw I've also drawn it, you can have a look here http://skywalker.inf.um.es/~rtitos/files/protocol_race.png ) - It is initiated by the replacement of the line (state M) at L1cache-1 (in my case when flushing the cache, but it may as well happen in the original version of such protocol with any common replacement). The PUTX msg sent by L1cache-1 is delayed many cycles during its trip to the L2, and in the meantime the following happens: --- L1cache-0 issues a GETS, that arrives at L2 and is forwarded to the L1cache-1, finding the line in state I_M (since it's being replaced). --- L1cache-1 send the data (dataSfromL1) to the requestor (L1cache-0) and a writeback to the L2, and sets the line is state I. --- The L1cache-0 receives the data and sends an unblock to the L2. --- The L1cache-1 receives a Store request, and since the block is in state I it issues a new GETX message, setting the block to IM state. --- The unblock from core 0 arrives at L2 and updates the directory (but it doesn't remove the core 1 from the sharer list). - Finally, the L1_PUTX arrives at L2. Since the L1cache-1 is still a sharer, is not marked at L1_PUT_old, but instead is recycled (the line is blocked at L2 (MT_IB) waiting for a WB_Data from L1cache-1). - The GETX from L1cache-1 arrives at L2 and it's also recycled. - Both GETX and PUTX from L1cache-1 are recycled a few times. - The WB_data from core 1 arrives, and the line is unblocked. - In this moment, the GETX is the next msg in the recycled queue (because PUTX was considered right before the WB_data arrived), so it's processed and the line blocked again, from SS to SS_MB state. The L2 send INV to L1cache-0 and L2Data to L1cache-1. - The PUTX is recycled over and over again since the line is blocked. - L1cache-1 receives L2Data, transits from IM to IM_M, waiting for the ack from L1cache-0. - L1cache-0 receives Inv and sends Ack to L1cache-1. - L1cache-1 receives Ack, changes from IM_M to M (resolves the store miss) and send Exclusive_Unblock to L2. - When the L2 receives the Exclusive_Unblock, changes the state from SS_MB to MT. - In this moment, the line is not blocked so the PUTX is processed and a WB_Ack is sent to the L1cache-1. - CRASH: Upon reception, the line is in state M and it shouldn't receive any WB_Ack msg in this state (no valid transition). Can anyone confirm this protocol bug? Or am I missing some subtle but important step? Again, my code didn't touch any of the "basic protocol stuff", but what confuses me is that the randomized tester can't make the original protocol break, but breaks my own version. Cheers, Rubén -- Ruben Titos Parallel Computing and Architecture Group (GACOP) Computer Engineering Dept. University of Murcia |
[← Prev in Thread] | Current Thread | [Next in Thread→] |
---|---|---|
|
Previous by Date: | Re: [Gems-users] Questions about load ruby in Simics and Power in opal, Luke Yen |
---|---|
Next by Date: | Re: [Gems-users] MESI_CMP_filter_directory - race with PUTX, Dan Gibson |
Previous by Thread: | Re: [Gems-users] invalidating a cache block, Mike Marty |
Next by Thread: | Re: [Gems-users] MESI_CMP_filter_directory - race with PUTX, Dan Gibson |
Indexes: | [Date] [Thread] |