There seems to be
a problem with the way Condor handles the submission of Java jobs. Consider
the following example.
Sorry about the
length of this but to get the details across I need to be clear and
complete.
I want to submit
several jobs each with their own set of parameter files for the java program.
so we have the following
directory
structure.
T03_Bob/
- holds common jar files used by all experiments and common data
files
ecj.jar
javacsv.jar
yearssn-and-ice-cores-crete-1721-1983-multivariate-train1.dat
yearssn-and-ice-cores-crete-1721-1983-multivariate-test1.dat
Experiments/
- holds the condor job submission files
b_01_submitAllJobs.bat - file to submit the
condor job
b_02_allJobs.sub
- file with condor paramters
EXP_000001/
- directory for experiment 1
ssnAndIceCores.ALLparams
EXP_000002/
- directory for experiment 2
ssnAndIceCores.ALLparams
EXP_000003/
- directory for experiment 3
ssnAndIceCores.ALLparams
b_01_submitAllJobs.bat
------------------------------------
condor_submit
b_02_allJobs.sub
b_02_allJobs.sub
--------------------------
universe
= java
# requirements =
(OpSys == "WINNT50") || (OpSys == "WINNT51")
#
requirements = (Machine ==
"ir41165valdes") || (Machine == "ir41128valdes")
# This
file contains all experiments to submit to condor
# results will be placed
into the individual experiment's directory
# Use
Condor's File Transfer Mechanism instead of, for example
# using a shared
file system. I've sent an email to see if another
# file transfer policy
can be used besides copying back to the
# submitter's machine (and
potentially overwriting the contents) and
# the response was no (as of 17
MAR 2004)
executable
=
..\..\ecj.jar
arguments
= ec.Evolve -file
ssnAndIceCores.ALLparams
transfer_input_files =
..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-train1.dat,..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-test1.dat,ssnAndIceCores.ALLparams
jar_files
=
..\..\ecj.jar,..\..\javacsv.jar
initialdir
= EXP_000001/
should_transfer_files =
YES
when_to_transfer_output =
ON_EXIT
log
=
00_condorNode.log
error
=
00_condorNode.err
output
= 00_condorNode.out
Queue
executable
=
..\..\ecj.jar
arguments
= ec.Evolve -file
ssnAndIceCores.ALLparams
transfer_input_files =
..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-train1.dat,..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-test1.dat,ssnAndIceCores.ALLparams
jar_files
=
..\..\ecj.jar,..\..\javacsv.jar
initialdir
= EXP_000002/
should_transfer_files =
YES
when_to_transfer_output =
ON_EXIT
log
=
00_condorNode.log
error
=
00_condorNode.err
output
= 00_condorNode.out
Queue
executable
=
..\..\ecj.jar
arguments
= ec.Evolve -file
ssnAndIceCores.ALLparams
transfer_input_files =
..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-train1.dat,..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-test1.dat,ssnAndIceCores.ALLparams
jar_files
=
..\..\ecj.jar,..\..\javacsv.jar
initialdir
= EXP_000003/
should_transfer_files =
YES
when_to_transfer_output =
ON_EXIT
log
=
00_condorNode.log
error
=
00_condorNode.err
output
= 00_condorNode.out
Queue
Note that the
executable class (with main) is actually ec.Evolve BUT this class is in the
ecj.jar file.
I tried using
ec.Evolve in this and other java submissions (that work) but found that Condor
can't
deal with the
class when it's in a jar file ... it looks for the class file and can't find
it to 'transfer' (even
though the example
in the manual with a jar file that contains the clasees suggests that
you
should put the
executable class in thie executable statement). So I found that it was necessary
to
specify the jar
file in the executable. So given this starting point I try to submit the job
...
D:\ecj\Condor\T03_Bob\Experiments>condor_submit
b_02_allJobs.sub
Submitting job(s).
ERROR: failed to transfer executable
file
..\..\ecj.jar
D:\ecj\Condor\T03_Bob\Experiments>
Since we set the
inital directory to EXP_000001, etc. we expected that it would find the jar
files
in ../../ relative
to the initial directoy as it does for the files to be transferred. but it
cannot find the jar file.
We changed the
references to the jar file in the executable statement so it
reads:
executable
= ..\ecj.jar
Now it gets past the first message suggesting it
found the file relative to the submit directory.
But it gives the following
message:
D:\ecj\Condor\T03_Bob\Experiments>condor_submit
b_02_allJobs.sub
Submitting job(s)
ERROR: Can't open
"D:\ecj\Condor\T03_Bob\Experiments\EXP_000001/..\ecj.jar" with flags 00
(No such file or directory)
So initially it
expected the execuatable jar file to be in a directory relative to the submit
directory and then later
it expects it to
be in a directory relative to the initial directory. BUt condor can't have it
both ways. I would
consider this a
bug. If I put the jar files in both the
T