Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [condor-users] condor-mpich
- Date: Fri, 20 Feb 2004 18:02:48 -0600
- From: Erik Paulson <epaulson@xxxxxxxxxxx>
- Subject: Re: [condor-users] condor-mpich
----- Forwarded message from owner-condor-users@xxxxxxxxxxx -----
Date: Fri, 20 Feb 2004 23:21:28 +0100
From: Olivier Ricou
To: condor-users@xxxxxxxxxxx
Subject: Re: [condor-users] condor-mpich
Message-ID: <20040220232128.A1184@xxxxxxxxxxxxxxxx>
References: <4036706F.4080708@xxxxxxxxx>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <4036706F.4080708@xxxxxxxxx>; from joelh@xxxxxxxxx on Fri, Feb 20, 2004 at 03:39:11PM -0500
X-Miltered: at shiva.jussieu.fr with ID 403687EC.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Antivirus: scanned by sophie at shiva.jussieu.fr
X-CSL-MailScanner-Information: Please contact lab@xxxxxxxxxxx for more information
X-CSL-MailScanner: Found to be clean
20/02/04 die, ad 21h39, Joel Hernandez <joelh@xxxxxxxxx> dixit :
> I've been trying to setup several of our nodes to run as dedicated
> resources in order to run MPI jobs. I've tested the setup using the
> simple example in section 2.10 of the online Condor Manual for V6.6.
>
> However instead of containing the print out from stdin, the outfile
> contains the following error message:
>
> rm_3660: (-) net_recv failed for fd = 3
> rm_3660: p4_error: net_recv read, errno = : 104
If your nodes running MPI are Linux machines, have a look
to /proc/sysvipc/sem to see if your node still have semaphores
free ? If you see a list of 128 or 256 (I don't remember the
limite), then it means your MPI program has a problem and does
not free the semaphore properly.
To clean by hand the semaphores (and shared memory I think), use
cleanipcs which should be in the sbin directory of your MPI.
Beware, it cleans only the semaphore owned by the user running
cleanipcs.
Hope it helps,
Olivier.
----- End forwarded message -----
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>