HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] [condor-staff] questions about local universe job exit semantics



On Dec 26, 2006, at 7:15 PM, Derek Wright wrote:

currently, they happen in the above order. so, if you were unlucky, and things crashed, you could potentially see the job exited event in the userlog, but the job was left marked running, so the job could run again. this is the desired behavior, since we say 2 exited events is better than none. you'd also see the job's output classad written twice (which would probably break something, i don't know if anything/anyone can handle this case). however, if you were unlucky and crashed between (c) and (d), you could have the job exit without any email notification at all.

Sorry for not replying sooner.

Two exit events are better than none, but an exit event followed by re-execution of the job is bad. I'm surprised Peter hasn't jumped all over this. Given his current opinion of the reliability of the user log, he may have just given up in disgust. :-) Once an exit event appears in the user log, the job can't re-execute. Otherwise, the user log is worthless as anything other than a historical archive. The user can't tell from the user log when the output of the job is available for use.

+--------------------------------+-----------------------------------+
|           Jaime Frey           | I used to be a heavy gambler.     |
|       jfrey@xxxxxxxxxxx        | But now I just make mental bets.  |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind.        |
+--------------------------------+-----------------------------------+