[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] what if transfer of input files fails ?



On 11/17/2025 11:22 AM, Stefano Belforte via HTCondor-users wrote:

Hi Experts,

I have a situation where many jobs are submitted using a common
file listed in `transfer_input_files`.
Due to my own thing that file can be changed even while jobs
are being submitted (it is a tarball, some things may get added for
newer jobs from new subdags).  I have never noticed a problem, but am
curious to know it that was luck, lack of attention, or a feature !
What happens if file is being written while
condor tries to transfer it. Will it simply try again ? Possibly
evicting and re-queuing the starting job ?

Thanks

Stefano


Hi Stefano,

It will simply transfer it as it currently stands - this could mean sending an incomplete input file if you are in the process of updating it. If the file does not exist at all, and yet is explicitly named in transfer_input_files, then the job will go on hold.

On Linux: I suggest you either (a) give each input file version a unique name, or (b) update the input file atomically.  For (b), this can be accomplished on Linux by writing the new version of your file into a .tmp file, and then doing a rename, as rename on the same volume is supposed to be an atomic operation on Linux according to the POSIX standard.  E.g, if you have transfer_input_file=/someFolder/inputData.tar, then update by doing:

    tar cf /someFolder/inputData.tar.tmp ...
    mv -f /someFolder/inputData.tar.tmp /someFolder/inputData.tar

On Windows:  a file that is open for writing will (by default) be locked and not available for reading, in which case I believe the AP will try again in a few moments.

Hope the above helps,
Todd