Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Dataflow job skips when executable is updated
- Date: Thu, 22 Jun 2023 11:27:32 -0500 (CDT)
- From: Todd L Miller <tlmiller@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Dataflow job skips when executable is updated
I just discovered the massively useful "skip_if_dataflow" submit option (how
did I miss this before?).
I'm going to guess because it's not in the manual's index or in
the condor_submit man page; the only place I can find it in the manual is
in the file-transfer section, which we should probably fix.
Its docs say that the job will be skipped only if its outputs are newer
than either its inputs or executable. This works correctly for the
inputs, but when I touch the executable the job still skips.
The logic that's actually implemented is that a job marked as
dataflow is skipped if:
* the oldest output file is newer than the newest input file,
* or the executable is newer than the newest input file,
* or the standard input is newer than the newest input file.
This matches the documentation I found. Where did you find something
saying what you did above?
Honestly, I'm not sure what the reasoning for the second two
points is; I would have assumed the same thing you did, that updating the
executable (or standard input file) means you want to run the job again.
This may just have been a mistake in the original design.
A related one for the wish list: when "transfer_output_files = dir", then if
directory 'dir' already exists, its timestamp isn't updated when Condor
transfers it back (at least on my file system), hence the job will never be
skipped.
I'm aware that directory timestamp updates depend on file system and transfer
mechanism, but for dataflow jobs it would be desirable if Condor explicitly
touched the items in transfer_output_files upon return.
This sounds like a good idea, and I made a ticket for it.
- ToddM