[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-devel] JobRunCount
- Date: Fri, 10 Aug 2007 08:17:03 -0500
- From: Erik Paulson <epaulson@xxxxxxxxxxx>
- Subject: Re: [Condor-devel] JobRunCount
On Fri, Aug 10, 2007 at 03:08:10AM -0700, Derek Wright wrote:
>
> Therefore, here's my current straw-man proposal:
>
> (b) NumberJobExecuted
> -- incremented (by both shadows) every time the starter sends
> CONDOR_begin_execution
>
begin_execution can fail, and isn't an quick operation - I can imagine
plenty of scenarios where without some sort of two-phase update to
NumberJobExecuted it either goes up without a starter actually being
spawned, or a starter is spawned but the job ad is not updated. Is that
a big deal? An easier counter might be "NumberMinimumAttemptsAtStarter"
> (c) NumberShadowSpawned
> -- incremented (by the schedd) every time it spawns a shadow
>
> (d) NumberJobReconnected
> -- incremented (by the new shadow) every time it successfully
> reconnects, regardless of if the starter was killed and restarted in
> the meantime. So, there'd be no counter for "number of reconnects to
> the current starter" in my proposal, though, you could at least tell
> how many starters are in the picture via NumberJobExecuted.
>
> (e) NumberResumedFromCheckpoint and NumberJobRestarted
> -- std universe only: punt for now and leave NumRestarts in there as-
> is. :(
>
I'm fine with using Number instead of Num, and don't feel there's a
pressing need for a number of reconnects for this starter (though,
I think that's a more useful counter from a job policy standpoint than
the number of reconnects for the life of the job)
-Erik