[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] File last modification time or job last write() attribute?
- Date: Wed, 25 May 2016 23:05:34 -0400
- From: Michael V Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] File last modification time or job last write() attribute?
From: MIRON LIVNY <miron@xxxxxxxxxxx>
Date: 05/25/2016 02:29 PM
> Michael,
>
> Can you tell us how you plan to use this information. In other words
"why
> do you care about when the last write took place?"
>
> Miron
Sure, professor: in some scenarios the only reasonable
course of action is
to keep trying until the bitter, bitter end. And so
if timing out is not an
option, then one doesn't put a timeout function into
the code in the first
place.
I suppose it's in the same realm as Michelle Craft's
asymptotic optimization
on slide nine, with its eight-hour deadline:
http://research.cs.wisc.edu/htcondor/HTCondorWeek2016/presentations/WedCraft_NEOS.pdf
The trick is detecting the asymptote as early as possible
to minimize
badput time.
And so if a log file is supposed to have data written
to it for each
time slice, for example, and nothing has appeared
in it for far longer than
you'd expect a single time slice ought to take, then
you can conclude that
you're not going to make any further forward progress
and some action should
be taken. Since the job won't terminate itself for
reasons, it falls to a
periodic_hold or _remove _expression_ which can use
that last-write time number
compared to CurrentTime in order to trigger, imposing
an external timeout.
-Michael Pelletier