Thank you student.
OK. This is what I expected to hear. So, what is the expected run time of these applications and what is the expected frequency of progress reports. Also, do you know the file name where these reports are written?
Personally, rather than solving a general problem I prefer at this stage to focus on your specific problem.
Miron.
Sent from my iPhone
From: MIRON LIVNY <miron@xxxxxxxxxxx>
Date: 05/25/2016 02:29 PM
> Michael,
>
> Can you tell us how you plan to use this information. In other words "why
> do you care about when the last write took place?"
>
> Miron
Sure, professor: in some scenarios the only reasonable course of action is
to keep trying until the bitter, bitter end. And so if timing out is not an
option, then one doesn't put a timeout function into the code in the first
place.
I suppose it's in the same realm as Michelle Craft's asymptotic optimization
on slide nine, with its eight-hour deadline:
http://research.cs.wisc.edu/htcondor/HTCondorWeek2016/presentations/WedCraft_NEOS.pdf
The trick is detecting the asymptote as early as possible to minimize
badput time.
And so if a log file is supposed to have data written to it for each
time slice, for example, and nothing has appeared in it for far longer than
you'd expect a single time slice ought to take, then you can conclude that
you're not going to make any further forward progress and some action should
be taken. Since the job won't terminate itself for reasons, it falls to a
periodic_hold or _remove _expression_ which can use that last-write time number
compared to CurrentTime in order to trigger, imposing an external timeout.
-Michael Pelletier
|