[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Dataflow job skips when executable is updated



I just discovered the massively useful "skip_if_dataflow" submit option (how did I miss this before?).
	I'm going to guess because it's not in the manual's index or in 
the condor_submit man page; the only place I can find it in the manual is 
in the file-transfer section, which we should probably fix.
Its docs say that the job will be skipped only if its outputs are newer than either its inputs or executable. This works correctly for the inputs, but when I touch the executable the job still skips.
	The logic that's actually implemented is that a job marked as 
dataflow is skipped if:
* the oldest output file is newer than the newest input file,
* or the executable is newer than the newest input file,
* or the standard input is newer than the newest input file.

This matches the documentation I found. Where did you find something saying what you did above?
	Honestly, I'm not sure what the reasoning for the second two 
points is; I would have assumed the same thing you did, that updating the 
executable (or standard input file) means you want to run the job again. 
This may just have been a mistake in the original design.
A related one for the wish list: when "transfer_output_files = dir", then if directory 'dir' already exists, its timestamp isn't updated when Condor transfers it back (at least on my file system), hence the job will never be skipped.
I'm aware that directory timestamp updates depend on file system and transfer 
mechanism, but for dataflow jobs it would be desirable if Condor explicitly 
touched the items in transfer_output_files upon return.
	This sounds like a good idea, and I made a ticket for it.

- ToddM