HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Is it possible to change the log file for a running job?



On Mon, Aug 18, 2008 at 04:50:07PM -0700, Alain Roy wrote:
> I would claim that failures are normal operation. In a distributed  
> system, there are frequently causes of error.

I would go so far as to say the mean time to failure for a
distributed system is "right now". Failure case handling and "dropping
management"--leaving lost files (or some other resource) laying around,
should never be an afterthought in grid systems, it is as equally
important as a successful run. I'd even say more so, since a job can
have many failures in a row for different reasons and then only one
(of course) successful run.

Thank you.

-pete