Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] MPI problem
- Date: Thu, 13 May 2010 12:37:46 +0200
- From: antoni artigues <tartigues@xxxxxxx>
- Subject: [Condor-users] MPI problem
Hello
I'm trying to execute an MPI, with MPICH2, on my Condor cluster.
My job desc file is:
-----------------------------------------------------
universe = parallel
executable = mp2script
arguments = sim problem.input
Requirements = OpSys == "LINUX" && Arch =="X86_64"
Rank = machine == "xxxxxxxx"
log = logfile
output = outfile.$(NODE)
error = errfile.$(NODE)
machine_count = 2
queue
-----------------------------------------------------
So, I request 2 slots of the same machine. But the job is not executed,
here are the logs:
The output of the node 0 is:
Too many retries, could not start all 2 nodes, only started 1, giving
up. Here are the hosts I could start
The output of the node 1 is empty, and the mpd.out of node 1 is:
An mpd is already running with console at /tmp/mpd2.console_condor on
vm-ubuntu64.intranet.iac3.eu.
Start mpd with the -n option for a second mpd on same host.
In the logFile I see:
015 (083.000.001) 05/13 12:19:53 Node 1 terminated.
(1) Normal termination (return value 255)
015 (083.000.000) 05/13 12:23:19 Node 0 terminated.
(1) Normal termination (return value 1)
Where is the problem? Why Condor tries to start a second mpd on the same
machine?
Thanks in advance
Regards
Antoni Artigues