Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] transfer_in/output_files only if they exist
- Date: Mon, 11 Feb 2019 17:50:01 +0000
- From: Duncan Brown <dabrown@xxxxxxx>
- Subject: Re: [HTCondor-users] transfer_in/output_files only if they exist
Hi Todd,
Ah, very nice, that's what I need!
Cheers,
Duncan.
> On Feb 8, 2019, at 12:36 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
>
> On 2/7/2019 8:00 AM, Duncan Brown wrote:
>> Hi Todd,
>>
>> Is there a way to tell condor that it's OK if a specific file listed in transfer_input_files does not exist (and the same question with an output file)? The use case is using condor file i/o to manage a checkpoint file.
>
> Hi Duncan,
>
> TJ already answered the question above, but I am not certain you need to
> do the above to handle your checkpoint file use case. :)
>
> When your submit file has
>
> when_to_transfer_output = ON_EXIT_OR_EVICT
>
> what happens is when your job is evicted, any output files are
> transferred back to the SPOOL directory for that job on the submit
> machine. When your job is rescheduled to run again, HTCondor first
> sends all the specified transfer_input files to the execute node, **and
> then subsequently also sends all the files stored in SPOOL**. The
> point being your checkpoint file need not be listed explicitly in
> transfer_input_files at all... it will get transferred on restart
> assuming it was considered output from a previous run.
>
> So imagine you have a job that has input data ('my_input_data'), output
> data ('my_output_data), and it periodically writes a checkpoint file
> ('ckpt_file'). Your submit file could look like:
>
> executable = foo.exe
> when_to_transfer_output = ON_EXIT_OR_EVICT
> transfer_input_files = my_input_data
> transfer_output_files = my_output_data ckpt_file
>
> With the above, the only issue may be your job going on hold if your job
> is evicted before it ever writes out its initial ckpt_file, because it
> will not exist and yet is explicitly declared in transfer_output_files.
> To prevent this case, you could make a zero-length ckpt_file on
> submission, and add it to transfer_input_files. This way the job will
> never go on hold because all files listed in "transfer_output_files"
> will always exist. Because HTCondor first sends the input files and
> then sends the spool files, on restart after a ckpt HTCondor will first
> send the zero-length ckpt file from transfer_intput_files, but then
> immediately overwrite it when the ckpt_file contents from the SPOOL
> directory (i.e. the ckpt_file contents from the last run) is sent.
>
> Hope the above helps,
> Todd
>
>> The use case is using condor file i/o to manage a checkpoint file. The first time the job is run, the checkpoint file does not exist so the job gets stuck in hold state. I want to be able to tell condor that it's OK that this file is not there.
>>
>> Cheers,
>> Duncan.
>>
>
>
>
> --
> Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
> Center for High Throughput Computing Department of Computer Sciences
> HTCondor Technical Lead 1210 W. Dayton St. Rm #4257
> Phone: (608) 263-7132 Madison, WI 53706-1685
--
Duncan Brown Room 263-1, Physics Department
Charles Brightman Professor of Physics Syracuse University, NY 13244
http://dabrown.expressions.syr.edu Phone: 315 443 5993