Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Passing condor_dagman args with condor_submit_dag?
- Date: Tue, 28 Feb 2006 18:55:55 -0500
- From: Armen Babikyan <armenb@xxxxxxxxxx>
- Subject: [Condor-users] Passing condor_dagman args with condor_submit_dag?
Hi,
I am attempting to use Condor for a large distributed batch processing
project. I'm using condor_dagman as a meta scheduler by limiting the
number of jobs that occur at the same time. I've organized each job to
be an iteration of a loop, and I have 2 layers of recursion. Let me
throw some numbers out there: my outer loop iterates 100 times, and my
inner loop iterates 1000 times (each of these loops contains a DAG). I
am implementing looping by unrolling the logical loop into a dynamically
generated DAG file.
While my solution might prevent condor's scheduler from getting
overloaded with jobs, I am faced with another problem: organizing the
files on disk so that one directory doesn't contain something on the
order of 100*1000 = 100,000's of submit files (and a multiple for output
and log files). I'm starting with the obvious: make a directory and
subdirectory for each iteration of the inner loop. However, I am
running across a problem.
condor_dag_submit -no_submit accepts my *.dag file and produces a
*.dag.condor.sub file, but I am having trouble properly referencing this
*.dag.condor.sub file from a *.dag file in the parent directory. I
think this is because condor_dag_submit does not let me configure some
of condor_dagman's arguments in the submit file it generates. For example:
outer *.dag file:
JOB MAINDAG_111 111/maindag_111.dag.condor.sub
JOB MAINDAG_222 222/maindag_222.dag.condor.sub
111/maindag_111.dag.condor.sub:
# Filename: maindag_111.dag.condor.sub
# Generated by condor_submit_dag maindag_111.dag
universe = scheduler
executable = /opt/condor/bin/condor_dagman
getenv = True
output = maindag_111.dag.lib.out
error = maindag_111.dag.lib.out
log = maindag_111.dag.dagman.log
remove_kill_sig = SIGUSR1
on_exit_remove = (ExitBySignal == false || ExitSignal =!= 9)
arguments = -f -l . -Debug 3 -Lockfile maindag_111.dag.lock
-Condorlog /tmp/exp6/111/process_a_111.log -Dag maindag_111.dag -Rescue
maindag_111.dag.rescue -MaxIdle 5 -MaxJobs 1 -UseDagDir
environment =
_CONDOR_DAGMAN_LOG=maindag_111.dag.dagman.out;_CONDOR_MAX_DAGMAN_LOG=0
queue
When the outer condor_dagman reads and tries to execute the inner loop's
condor_dagman, it fails, because it looks in the outer directory for
maindag_111.dag rather than in the directory 111 (where the above submit
file, and anything related to maindag_111*, is).
Is there a way I can tell condor_dag_submit to pass particular arguments
(e.g. -Dag, -Rescue, output files) to the submit file it generates?
It would be cool if there was a way to get condor_dagman to chdir() into
a directory before executing. I looked at -UseDagDir, but this will put
output/log files in the parent directory - something I am trying to avoid.
I guess I could write my own condor_submit_dag too, but I'd rather not
go to that extreme. :-)
Any insight would be great. Thanks!
- Armen
--
Armen Babikyan
MIT Lincoln Laboratory
armenb@xxxxxxxxxx . 781-981-1796