Dear all,
I have a cluster with local IP addresses as 192.168.1.1~192.168.1.10 (node name: N01~N10) and every node has the Windows 7 64bit installed. I built the program by VS2010 (C++)+Intel Fortran+Intel MPI. Currently I launch my program by Intel MPI with the following command:
mpiexec -wdir Z:\ -hosts 10
n01 12 n02 12 n03 12 n04 12 n05 12 n06 12 n07 12 n08 12 n09 12 n10 12 -mapall Z:\test
Now the problem is that with the same parameters to program 'test', sometimes the program test is OK but sometimes it has the following error message:
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(659)......................:
MPID_Init(195).............................: channel initialization
failed
MPIDI_CH3_Init(106)........................:
MPID_nem_tcp_post_init(344)................:
MPID_nem_newtcp_module_connpoll(3099)......:
recv_id_or_tmpvc_info_success_handler(1328): read from socket failed - No error
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(659)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................:
MPID_nem_tcp_post_init(344)..........:
MPID_nem_newtcp_module_connpoll(3099):
gen_read_fail_handler(1194)..........: read from socket failed - The specified n
etwork name is no longer available.
or the following error message:
*********** Warning
************
Unable to map \\n01\Debug. (error 71)
*********** Warning ************
launch failed: CreateProcess(\\n01\Debug\directional\fem) on 'N09' failed, error
2 - The system cannot find the file specified.
launch failed: CreateProcess(\\n01\Debug\directional\fem) on 'N07' failed, error
2 - The system cannot find the file specified.
launch failed: CreateProcess(\\n01\Debug\directional\fem) on 'N02' failed, error
2 - The system cannot find the file specified.
*********** Warning ************
Unable to map \\n01\Debug. (error 71)
I don't know what could lead to these problems.
Can I solve this problem if I launch the program 'test' by HTConder under Windows 7 64bit+Intel MPI?
Is there any simple method to quick setup the HTConder to let the program test work on my cluster with 120 processes?
Thanks,
Zhanghong Tang