But I am still concerning the protocol as you mentioned. The protocol 
I am using now is MOESI_CMP_directory, and the 16p are placed on 16 
chips (1 p on 1 chip). Actually I want to observe the network behavior 
of a 16p CMP, but in order to utilize the auto-generated TORUS2D 
interconnet, I placed each processor on a single chip, but reduced the 
latencies to make the 16p communicate like on a single chip. Is it 
resonable to do this? Is there any restriction on the protocol can be 
used if I do so?
  This is reasonable, but I'm not sure where the directory/memory 
controllers are placed and plus the default TORUS2D code will create a 
memory controller for every processor.
An alternative approach uses FILE_SPECIFIED topology.
--Mike
   
 
 |