[Gems-users] panic and send mondo case


Date: Mon, 7 Jan 2008 15:52:28 -0600 (CST)
From: "Doe Hyun Yoon" <dyoon@xxxxxxxxxxxxxx>
Subject: [Gems-users] panic and send mondo case
Hi,
I am doing simulation with SPLASH2 benchmark on GEMS. With 16 processors,
I got the following errors on the terminal output.

----------------
panic: failed to stop cpu11

^Mpanic[cpu0]/thread=2a100047cc0: send mondo timeout (target 0xb) [556597
NACK 0 BUSY]

000002a100047540 SUNW,UltraSPARC-III+:send_one_mondo+160 (b, b, 17eee, 2,
18add58, 1)
  %l0-3: 0000000000000000 000b000000000000 0000000000000001 0000000000087e35
  %l4-7: 0000000000000000 0000000001209400 000000042d1038b5 000000042d10395d
000002a1000475f0 unix:xt_one_unchecked+c8 (b, 100b354, 10001, 0, 0, 1)
  %l0-3: 000000000000000b 00000000018850f0 0000000000000000 000002a1000476a8
  %l4-7: 0000000000000000 0000000000000800 000000000000000b 000002a1000476f0
000002a1000476f0 TS:ts_tick+208 (300029da380, 2b, 0, 60007902820, 0, 1925800)
  %l0-3: 0000000000000000 000000000000002b 0000000000000000 0000000001925e78
  %l4-7: 0000000001925b28 0000000000000350 0000000000000035 0000000000000002
000002a1000477c0 genunix:clock_tick+1c (300029da380, 0, 18bb400, 1910c00,
19253e0, 1)
  %l0-3: 00000000012159cc 0000060000c0af10 000000000000001b 000000000191a400
  %l4-7: 0000000000000001 00000600079538b8 0000000000000064 0000000000000064
000002a100047870 genunix:clock+500 (1910800, 18bb400, 0, 60000286880,
2a100909cc0, 3000207e000)
  %l0-3: 0000000000000000 0000000000000000 00000300029da380 0000000000000000
  %l4-7: 00000300029da380 0000000000000000 0000000000020b5e 0000000000020b5d
000002a100047960 genunix:cyclic_softint+a4 (60000c48ec0, 60000dda1a8, 1,
60000c48e40, 60000d084c4, 60000dda180)
  %l0-3: 0000060000d08628 0000060000c48ea0 0000000000000005 0000000000000007
  %l4-7: 0000060000d084a8 00000000010abe88 0000000000000000 00000000000209e7
000002a100047a20 unix:cbe_level10+8 (0, 0, 180c000, 2a100047d98, 4000c0,
100c2ec)
  %l0-3: 0000000000000010 0000000000010000 0000000001911800 00000000018a5000
  %l4-7: 0000060000d084a8 0000000000000640 0000000000000000 0000000000000000

panic: entering debugger (continue to save dump)
Type 'go' to resume
ERROR: ^@idle_a_cpu: grab_cpu 1 failed
ERROR: ^@idle_other_cpus: cpu id 1 failed to stop: state 8
------------------------

There was no problem with 4 processors, but this happened only with 16
processors (not all program, but many of SPLASH2).

I can find there were a couple of similar problems in the gems-users
archive  (in 2005) , for instance,
https://lists.cs.wisc.edu/archive/gems-users/2005-August/msg00022.shtml

But, I can't find any solution to this issue.

Does anyone knows about this .. panic / send mondo issue?

FYI: I'm using solaris10 16 processor, gems 1.4 with ruby and opal, and
MOSI_SMP_bcast is used. This problem happened on Barnes, Ocean, FMM,
Water-Spatial, and Raytrace.
With simics simulator only, there was no problem (haven't checked ruby
only simulation).

Thanks,
Doe Hyun







[← Prev in Thread] Current Thread [Next in Thread→]
  • [Gems-users] panic and send mondo case, Doe Hyun Yoon <=