[Gems-users] Deadlock


Date: Mon, 5 Feb 2007 22:31:29 -0500
From: "Nitin Bhardwaj" <bhardwaj@xxxxxxxxxxxxxxxx>
Subject: [Gems-users] Deadlock
Hi,

I have created a coherence protocol (bcast) and the network file specified with that is:

(4 Proc 4 Procs/chip 1 L2 and 1 Directory):
ext_node:L1Cache:0 int_node:0 link_latency:1 bw_multiplier:64
ext_node:L1Cache:1 int_node:0 link_latency:1 bw_multiplier:64
ext_node:L1Cache:2 int_node:0 link_latency:1 bw_multiplier:64
ext_node:L1Cache:3 int_node:0 link_latency:1 bw_multiplier:64

ext_node:L2Cache:0 int_node:1 link_latency:1 bw_multiplier:64
ext_node:Directory:0 int_node:2 link_latency:1 bw_multiplier:64

int_node:0 int_node:1 link_latency:1 bw_multiplier:64
int_node:1 int_node:2 link_latency:1 bw_multiplier:64

Right now i am generating a single ST transaction from Proc 0. I have checked that the Network is created as expected and the message is getting broadcasted to all the Proc L1 caches, L2 and Directory. On a miss detection in L2, L2 forwards a request to Directory which responds with the data.   

Following are the actions taken in L2 when i recieve data from Directory:

    u_writeDirDataToL2Cache;
    c_issueDataExclusive;
    s_deallocateTBE;
    k_popDirDataQueue;

And the corresponding actions in L1 on data:

    w_writeDataFromTBEToCache;
    hh_store_hit;
    d_deallocateTBE;
    i_popAddressQueue;

But i am seeing a deadlock in the network and the ST doesn't retire. Following the debug trace. Is their something wrong in the way i am creating a network any pointers to root cause the failure are highly appreciated.

Debug: in fn const Message* MessageBuffer::peekAtHeadOfQueue() const in buffers/MessageBuffer.C:166: Peeking at head of queue [Chip 0 0, L2Cache, DirDataToL2_in] time: 102.
Debug: in fn const Message* MessageBuffer::peekAtHeadOfQueue() const in buffers/MessageBuffer.C:172: *msg_ptr is [ResponseMsg: Address=[0x400, line 0x400] Type=DATA SenderMachId=Directory-0 Destination=[NetDest (3) 0 0 0 0  - 1  - 0  - ] DataBlk=0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] L1CacheStateStr= NumPendingExtAcks=0 MessageSize=Data Time=19 ]

Debug: in fn void L2Cache_Controller::c_issueDataExclusive(const Address&) in generated/MESI_CMP-bcast/L2Cache_Controller.C:232: executing
../protocols/MESI_CMP-bcast-L2cache.sm:404: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]
Debug: in fn void L2Cache_Controller::s_deallocateTBE(const Address&) in generated/MESI_CMP-bcast/L2Cache_Controller.C:116: executing
Debug: in fn void L2Cache_Controller::k_popDirDataQueue(const Address&) in generated/MESI_CMP-bcast/L2Cache_Controller.C:395: executing
Debug: in fn void MessageBuffer::pop() in buffers/MessageBuffer.C:339: pop from [Chip 0 0, L2Cache, DirDataToL2_in]
Debug: in fn TransitionResult L2Cache_Controller::doTransition(L2Cache_Event, L2Cache_State, const Address&) in generated/MESI_CMP-bcast/L2Cache_Transitions.C:32: next_state is M

[PrioHeap: ]Debug: in fn void EventQueue::triggerEvents(Time) in eventqueue/EventQueue.C:123: *(thisNode.m_consumer_ptr) is [Sequencer: 0, outstanding requests: 1, read request table: [ ], write request table: [ [0x400, line 0x400]=[CacheMsg: Address=[0x400, line 0x400] Type=ST ProgramCounter=[0x0, line 0x0] AccessMode=UserMode Size=0 Prefetch=Yes Version=0 Aborted=0 Time=1 ] ]]
Debug: in fn void EventQueue::triggerEvents(Time) in eventqueue/EventQueue.C:124: thisNode.m_time is 500001
Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:124: Possible Deadlock detected
Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:125: request is [CacheMsg: Address=[0x400, line 0x400] Type=ST ProgramCounter=[0x0, line 0x0] AccessMode=UserMode Size=0 Prefetch=Yes Version=0 Aborted=0 Time=1 ]
Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:126: m_chip_ptr->getID() is 0
Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:127: m_version is 0
Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:128: current_time is 500001
Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:129: request.getTime() is 1
Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:130: current_time - request.getTime() is 500000
Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:131: keys.size() is 1
Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:132: *m_writeRequestTable_ptr is [ [0x400, line 0x400]=[CacheMsg: Address=[0x400, line 0x400] Type=ST ProgramCounter=[0x0, line 0x0] AccessMode=UserMode Size=0 Prefetch=Yes Version=0 Aborted=0 Time=1 ] ]
Fatal Error: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:133: Aborting

-Thank You
Nitin Bhardwaj
[← Prev in Thread] Current Thread [Next in Thread→]