The only way ( that I am aware of ) to force jobs to start at once on
all machines is MPI universe. For our progs, which have to be started
this way, we wrap them in MPI_Init and MPI_Finalize and run as if it was
MPI application in MPI universe. This functionality of start barrier is
implemented in the dedicated schedd, running for MPI universe only, and
I don think that you have any way to hack it to do the same for non-mpi
jobs.
I remember that there were talks about generic parallel universe. Where
is it now, and any future plans on this - developers - answer our call!
I think that this 'start barrier' feature is rather necessary thing.
Mark
On Tue, 2003-11-04 at 14:57, Hahn Kim wrote:
My group has developed a Matlab library, called MatlabMPI, which
implements a subset of the MPI library. Currently, it launches Matlab
on multiple machines by sending commands via rsh. Currently, we are
trying to integrate MatlabMPI with Condor.
Like MPI, all processes in a MatlabMPI program must start executing at
the same time. Otherwise, any process that needs to communicate with an
idle process will cause the MatlabMPI program to hang.
We have been trying to figure out if there is a way to force Condor to
synchronously start executing a set of Matlab processes distributed
across a cluster. Does any one have any ideas? Is this functionality
built into Condor, or will this require a hack?
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>