Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] problems using transfer_output_remaps
- Date: Thu, 19 Jan 2006 15:20:11 -0800
- From: Adam Lathers <alathers@xxxxxxxxxxxxxx>
- Subject: [Condor-users] problems using transfer_output_remaps
Hi all,
I'm having some issues using the transfer_output_remaps option in a
submit file. Specifically, I'm submitting a DAG as a proof of
concept to work out the bugs before implementing a similar solution
for our big data processing codes. Essentially, the layout of our
architecture looks something like this. Our pool manager host
(schedd, collector, negotiator), exists "outside" our trusted realm,
so it has no access to our shared filesystem. All the worker nodes
exist inside the trusted realm, and all share a filesystem. (Yes, I
know there are some security paradigm issues there, but I can't solve
those presently). What I do need to deal with is, the data we will
be working with is "big"...total in and out data is something in the
order of 100GB presently, and presently, it's not segmented into
"small" pieces, so each worker node, were it to ship the input data,
would have to grab a 20-50GB dataset before processing started.
My goal in the short term is basically this. I'd like to rely on
the shared file system, and just "mimic" what I need to on the submit
node. Thus far, this works, but to make it happen, I need to
duplicate a directory structure on the submit node to look just like
the worker nodes. What I'd "prefer" to do is leverage the
transfer_output_remaps option, so that when logs and output and such
get shipped back to the submit machine, it just goes into a single
large log directory, with some sort of intelligent naming mechanism.
an example submit that I've tried looks something like this.
(note, for the transfer_output_remaps, I've also tried just naming
A.err and so on. Maybe I just missed the proper permutation?)
Universe = vanilla
Executable = /home/alathers/condor_matlab/condor_test/matlab.sh
InitialDir = /home/alathers/condor_matlab/condor_test
Error = /home/alathers/condor_matlab/condor_test_submitdir/
A.err
Log = /home/alathers/condor_matlab/condor_test_submitdir/
A.log
transfer_output_remaps = "/home/alathers/condor_matlab/
condor_test_submitdir/A.err = /home/alathers/condor_matlab/logs/A.err"
GetEnv = true
Arguments = A
Requirements = FileSystemDomain == "ncmir.ucsd.edu"
Notification = Error
Notify_user = alathers@xxxxxxxxxxxxxx
Queue
In the end, when the job finishes, the .log and .err files are sent
back to the submit node, and put in /home/alathers/condor_matlab/
condor_test_submitdir/
I'm sure I'm forgetting some vital piece of info, so please feel
free to let me know. Any thoughts, or insight would be REALLY
appreciated. As noted, I know there are a LOT of problems with the
present approach, but for various reasons my role is to solve this
step first, before redesigning the process. Thanx everyone.
_______________________________________________________
Adam Lathers
NCMIR: National Center for Microscopy and Imaging Research
Distributed Systems Engineer
phone: (858) 822-0735
fax: (858) 822-0828
web: http://ncmir.ucsd.edu