[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-devel] [condor-staff] questions about local universe job exit semantics
- Date: Mon, 8 Jan 2007 10:49:21 -0600
- From: Jaime Frey <jfrey@xxxxxxxxxxx>
- Subject: Re: [Condor-devel] [condor-staff] questions about local universe job exit semantics
On Dec 26, 2006, at 7:15 PM, Derek Wright wrote:
currently, they happen in the above order. so, if you were
unlucky, and things crashed, you could potentially see the job
exited event in the userlog, but the job was left marked running,
so the job could run again. this is the desired behavior, since we
say 2 exited events is better than none. you'd also see the job's
output classad written twice (which would probably break something,
i don't know if anything/anyone can handle this case). however, if
you were unlucky and crashed between (c) and (d), you could have
the job exit without any email notification at all.
Sorry for not replying sooner.
Two exit events are better than none, but an exit event followed by
re-execution of the job is bad. I'm surprised Peter hasn't jumped all
over this. Given his current opinion of the reliability of the user
log, he may have just given up in disgust. :-)
Once an exit event appears in the user log, the job can't re-execute.
Otherwise, the user log is worthless as anything other than a
historical archive. The user can't tell from the user log when the
output of the job is available for use.
+--------------------------------+-----------------------------------+
| Jaime Frey | I used to be a heavy gambler. |
| jfrey@xxxxxxxxxxx | But now I just make mental bets. |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind. |
+--------------------------------+-----------------------------------+