[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] file transfer problems with vanilla job



On Fri, 12 Nov 2004 10:17:06 +0000  "Dr Ian C. Smith" wrote:

> > just have your job periodically checkpoint itself.
>                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Is there any point in doing this ? If files are only staged back if the
> job runs to completion then only the results need to be saved just before
> completion (if there's sufficient memory).

sorry if my message wasn't clear enough.  i was trying to get the
point across that the files are only copied back into the directory
you *submitted* from once the job runs to completion.  if
when_to_transfer_ouput is set to "ON_EXIT_OR_EVICT", then any
intermediary files written by the job are transfered back to the
submit machine, they're just stored in a temporary spool location
(instead of your initial submit directory).  anything in this
temporary spool directory is sent back with your job the next time it
starts running.

> Saved state information cannot be transferred back if the jobs is
> killed

yes it can.  if you use ON_EXIT_OR_EVICT, any files created by your
job are transferred back (to the *spool* directory on the submit
machine, not the directoroy you submitted from), even if the job is
killed.  the only exception is if the job is "hard-killed" (for
example, condor_vacate -fast).  in that case, it really is killed,
nothing is transfered (for that run), and the job will restart with
whatever spooled files are still sitting on the submit machine.

> and is of no use once the job has run to completion.

true.  jobs that do their own checkpointing might want to remove their
checkpoint file after they write out their final results to their real
output file(s) as the last step before they complete successfully.
that way, condor won't needlessly transfer that final checkpoint file
back for you.  i'm honestly not sure if you'll still end up with the
last spooled copy (if any) that's already sitting on the submit
machine or not.  if you do, there's no real additional cost for
getting a copy of it, since it's already on the submit machine, and
just needs to be copied out of spool and into your submit directory.

i hope this (finally) clarifies this feature.  i'll make sure all this
wisdom ends up in the manual in the near future.

-derek