[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Dataflow job skips when executable is updated



Thanks Todd,

On 22/06/2023 19:27, Todd L Miller via HTCondor-users wrote:
>> I just discovered the massively useful "skip_if_dataflow" submit option (how did I miss this before?).
>
>   	The logic that's actually implemented is that a job marked as
> dataflow is skipped if:
>
> * the oldest output file is newer than the newest input file,
> * or the executable is newer than the newest input file,
> * or the standard input is newer than the newest input file.
>
> This matches the documentation I found.  Where did you find something
> saying what you did above?

You're right, I checked back in the docs[1]. Clearly my confirmation 
bias goggles were filling in what I *assumed* it would say. It indeed 
looks like the flaw was in the initial design.

A point to keep in mind for implementation: input files could be 
symlinks, in which case I'd expect the logic to use the timestamp of 
their targets (like touch and make do), not that of the symlink.

What if an input (or output) is a directory? Recurse over its contents 
(like Docker does, but unlike make) or only look at its timestamp?

>> for dataflow jobs it would be desirable if Condor explicitly touched the items in transfer_output_files upon return.
>   	This sounds like a good idea, and I made a ticket for it.

Super, thanks!

Cheers
Marco

[1] 
https://htcondor.readthedocs.io/en/latest/users-manual/file-transfer.html?highlight=Dataflow#dataflow-jobs