Hi,
I am doing simulation with SPLASH2 benchmark on GEMS. With 16 processors,
I got the following errors on the terminal output.
----------------
panic: failed to stop cpu11
^Mpanic[cpu0]/thread=2a100047cc0: send mondo timeout (target 0xb) [556597
NACK 0 BUSY]
000002a100047540 SUNW,UltraSPARC-III+:send_one_mondo+160 (b, b, 17eee, 2,
18add58, 1)
%l0-3: 0000000000000000 000b000000000000 0000000000000001 0000000000087e35
%l4-7: 0000000000000000 0000000001209400 000000042d1038b5 000000042d10395d
000002a1000475f0 unix:xt_one_unchecked+c8 (b, 100b354, 10001, 0, 0, 1)
%l0-3: 000000000000000b 00000000018850f0 0000000000000000 000002a1000476a8
%l4-7: 0000000000000000 0000000000000800 000000000000000b 000002a1000476f0
000002a1000476f0 TS:ts_tick+208 (300029da380, 2b, 0, 60007902820, 0, 1925800)
%l0-3: 0000000000000000 000000000000002b 0000000000000000 0000000001925e78
%l4-7: 0000000001925b28 0000000000000350 0000000000000035 0000000000000002
000002a1000477c0 genunix:clock_tick+1c (300029da380, 0, 18bb400, 1910c00,
19253e0, 1)
%l0-3: 00000000012159cc 0000060000c0af10 000000000000001b 000000000191a400
%l4-7: 0000000000000001 00000600079538b8 0000000000000064 0000000000000064
000002a100047870 genunix:clock+500 (1910800, 18bb400, 0, 60000286880,
2a100909cc0, 3000207e000)
%l0-3: 0000000000000000 0000000000000000 00000300029da380 0000000000000000
%l4-7: 00000300029da380 0000000000000000 0000000000020b5e 0000000000020b5d
000002a100047960 genunix:cyclic_softint+a4 (60000c48ec0, 60000dda1a8, 1,
60000c48e40, 60000d084c4, 60000dda180)
%l0-3: 0000060000d08628 0000060000c48ea0 0000000000000005 0000000000000007
%l4-7: 0000060000d084a8 00000000010abe88 0000000000000000 00000000000209e7
000002a100047a20 unix:cbe_level10+8 (0, 0, 180c000, 2a100047d98, 4000c0,
100c2ec)
%l0-3: 0000000000000010 0000000000010000 0000000001911800 00000000018a5000
%l4-7: 0000060000d084a8 0000000000000640 0000000000000000 0000000000000000
panic: entering debugger (continue to save dump)
Type 'go' to resume
ERROR: ^@idle_a_cpu: grab_cpu 1 failed
ERROR: ^@idle_other_cpus: cpu id 1 failed to stop: state 8
------------------------
There was no problem with 4 processors, but this happened only with 16
processors (not all program, but many of SPLASH2).
I can find there were a couple of similar problems in the gems-users
archive (in 2005) , for instance,
https://lists.cs.wisc.edu/archive/gems-users/2005-August/msg00022.shtml
But, I can't find any solution to this issue.
Does anyone knows about this .. panic / send mondo issue?
FYI: I'm using solaris10 16 processor, gems 1.4 with ruby and opal, and
MOSI_SMP_bcast is used. This problem happened on Barnes, Ocean, FMM,
Water-Spatial, and Raytrace.
With simics simulator only, there was no problem (haven't checked ruby
only simulation).
Thanks,
Doe Hyun
|