Hi all, I am testing MPI jobs and have the following problem: My submit file 'pi.cmd' is: ... universe = mpi machine_count = 4 output =$(Cluster).$(Process).$(NODE).out error =$(Cluster).$(Process).$(NODE).err log =$(Cluster)..log executable = cpi queue 2 .... When I submit this file a cluster of 2 jobs is created: % condor_submit pi.cmd Submitting job(s).. Logging submit event(s).. 2 job(s) submitted to cluster 44. % condor_q -- Submitter: gtx01.esrf.fr : <160.103.6.172:60873> : gtx01.esrf.fr ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 44.0 klotz 10/13 14:39 0+00:00:00 I 0 0.4 cpi 44.1 klotz 10/13 14:39 0+00:00:00 I 0 0.4 cpi 2 jobs; 2 idle, 0 running, 0 held .... When job number '44.0' has finished, the second job '44.1' will be held in the queue and never start!!!!! % condor_q -- Submitter: gtx01.esrf.fr : <160.103.6.172:60873> : gtx01.esrf.fr ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 44.1 klotz 10/13 14:39 0+00:00:00 I 0 0.4 cpi .... If I change the sumit file to: ... universe = mpi machine_count = 4 output =$(Cluster).$(Process).$(NODE).out error =$(Cluster).$(Process).$(NODE).err log =$(Cluster)..log executable = cpi queue executable = cpi queue .... I get two clusters of one job each and both will be started as expected one after the other. Is this behavior normal???? Regards.... --
WD Klotz - Europ. Synch. Rad. Facility (ESRF) - 6 r Jules Horowitz, BP 220, 38043 Grenoble, FRANCE work: +33(0)4.76.88.29.21 fax:...24.27 mobile: +33(0)6.87.38.59.27 mail: klotz@xxxxxxx chat: skype Please avoid sending me Word(.doc) or PowerPoint(.ppt) attachments. |
No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.344 / Virus Database: 267.11.14/131 - Release Date: 12/10/2005