| Mailing List ArchivesAuthenticated access |  | ![[Computer Systems Lab]](http://www.cs.wisc.edu/pics/csl_logo.gif)  | 
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] What happes when a MPI job hangs?
- Date: Thu, 16 Feb 2006 16:52:44 -0600
- From: Matt Baker <bakerspage@xxxxxxx>
- Subject: [Condor-users] What happes when a MPI job hangs?
We are looking into using the latest Condor to manage MPI jobs in a  
Concurrent Computing class. We have a problem killing MPI jobs using  
just "mpirun", since killing one process does not kill the other  
processes that were spawned when calling mpirun.
We've read that both PBS and SGE have the ability to "sense" that the  
head node (process 0) has died and can clean up (kill and clear  
sockets) the other processes that block waiting for communication  
with process 0.
Is there a similar functionality in Condor? If I submit an unsafe MPI  
job and it hangs, will condor_rm take care of the process cleanup?
Thanks,
Matt
University of Arkansas