Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Shadow exception mpi jobs
- Date: Fri, 27 Jul 2007 12:52:35 +0200
- From: Ana Silva <asilva@xxxxxxx>
- Subject: [Condor-users] Shadow exception mpi jobs
Hi all,
I have a problem with condor and MPI, when I send a mpi job, It is
executed but when it is finished goes to the queue again,the state in
that case is "Idle". Our cluster is distributed, and mpi are in local
(are installed in each node). The outputs log is :
007 (043.000.000) 07/26 13:50:19 Shadow exception!
UserPolicy Error: No signal/exit codes in job ad!
125 - Run Bytes Sent By Job
32120 - Run Bytes Received By Job
But the results are correct!!
I read this in the manual:
*Event Number:* 007
*Event Name:* Shadow exception
*Event Description:* The /condor_ shadow/, a program on the submit
computer that watches over the job and performs some services for the
job, failed for some catastrophic reason. The job will leave the machine
and go back into the queue.
My classad is very simply :
CLASSAD
universe = parallel
executable = script
arguments = nodes a.out
Scheduler="DedicatedScheduler@XXXXXX"
machine_count=5
log = s.log
output = s.out
error = s.err
transfer_input_files = hello.c,a.out,nodes
should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT
queue
#############################
SCRIPT
#!/bin/bash
export LD_LIBRARY_PATH=/opt/openmpi/lib
/opt/openmpi/bin/mpirun --mca ssh /usr/local/condor/libexec/condor_ssh
--hostfile $1 -np 5 $2
#############################
I don't know how I can resolve the shadow exception. Any ideas?
Thanks
Regards!
--
Ana Silva
Sistemas y Supercomputación
Centro Informático Científico de Andalucía (CICA)
Avda. Reina Mercedes s/n - 41012 - Sevilla (Spain)
Tfno.: +34 955 056 600 / +34 955 056 632 / FAX: +34 955 056 650
Consejería de Innovación, Ciencia y Empresa
Junta de Andalucía
---------------------------------------------------
Portal de E-Ciencia de Andalucía
http://eciencia.cica.es
http://supercomputacion.cica.es
---------------------------------------------------
Este mensaje esta firmado digitalmente. Para poder
reconocer la firma desde su cliente debera tener
instalado el certificado raiz de la CA del CICA en
el mismo. Puede descargarlo desde:
http://pki.cica.es/cacert/
---------------------------------------------------
begin:vcard
fn:Ana Silva
n:Silva;Ana
org;quoted-printable:Centro Inf=C3=B3rmatico Cient=C3=ADfico de Andaluc=C3=ADa
adr;dom:;;www.cica.es
email;internet:asilva@xxxxxxx
tel;work:955056632
version:2.1
end:vcard