Hi--,
I was wondering whether it was possible to specify a URL (GridFTP) in a
DAG submit file to specify the location of input files needed by
executables? We are using the Globus universe and the Condor jobmanager in
our environment.
Our use case has submit machines separate from data repository machines and
we'd like to be able to submit jobs (via. DAGs) so that programs may act
upon data copied directly from the data repository (i.e the input data file
should be copied directly from the data repository node to where the job is
being executed). As I understand it, Condor only copies input data files
from the submit node to the execute node and not from a "third party node".
Our data files are quite large and I'd like to minimize the number of
copies and avoid having to copy the data file locally before submitting the
job.
I suppose I could use a PRE script to use globus-url-copy to stage the data
before execution begins. Would this be the right way of doing it?