Mailing List Archives
	Authenticated access
	
	
     | 
    
	 
	 
     | 
    
	
	 
     | 
  
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] MPI problem
- Date: Thu, 13 May 2010 12:37:46 +0200
 
- From: antoni artigues <tartigues@xxxxxxx>
 
- Subject: [Condor-users] MPI problem
 
Hello
I'm trying to execute an MPI, with MPICH2, on my Condor cluster.
My job desc file is:
-----------------------------------------------------
universe = parallel
executable = mp2script
arguments = sim problem.input
Requirements  = OpSys == "LINUX" && Arch =="X86_64"
Rank = machine == "xxxxxxxx"
log = logfile
output = outfile.$(NODE)
error = errfile.$(NODE)
machine_count = 2
queue
-----------------------------------------------------
So, I request 2 slots of the same machine. But the job is not executed,
here are the logs:
The output of the node 0 is:
Too many retries, could not start all 2 nodes, only started 1, giving
up.  Here are the hosts I could start 
The output of the node 1 is empty, and the mpd.out of node 1 is:
An mpd is already running with console at /tmp/mpd2.console_condor on
vm-ubuntu64.intranet.iac3.eu. 
Start mpd with the -n option for a second mpd on same host.
In the logFile I see:
015 (083.000.001) 05/13 12:19:53 Node 1 terminated.
	(1) Normal termination (return value 255)
015 (083.000.000) 05/13 12:23:19 Node 0 terminated.
	(1) Normal termination (return value 1)
Where is the problem? Why Condor tries to start a second mpd on the same
machine?
Thanks in advance
Regards
Antoni Artigues