HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Is it possible to change the log file for a running job?



Alain Roy wrote:
> On Aug 18, 2008, at 3:56 PM, João Abecasis wrote:
>>>
>>> Curious comment: what do you do if the SAGA application exits
>>> abnormally?  is the log file ever cleaned up?
>>
>> Ha, ha! SAGA applications don't exit abnormally... :-p
>>
>> Seriously, it depends on how abnormal the interruption is. Anyway,
>> leaving a temporary log behind in case of a fatal error doesn't sound
>> as bad, but accumulating them in the course of normal operation is
>> another matter.
>
> I would claim that failures are normal operation. In a distributed system,
> there are frequently causes of error.

Agreed.

> What if I submit 10,000 jobs and leave for vacation, hoping for the results
> when I return, but they all exit abnormally? What if I don't understand the
> reasonable limits of the system and I submit 10 million jobs?

Well, for now, the adaptor will try to use one log file per schedd,
although you can only really submit to localhost at this time. A limit
of jobs per log can be added.

The fact that submitted jobs exit abnormally is not a reason for logs
to be left behind. The controller application crashing might be, but
then again it depends on the particular details of how the application
crashes. As it stands, a try-catch would ensure it cleans up after
itself in class destructors. But I'd still need to get condor to stop
writing the log.

> In Open Science Grid, these sorts of clean up issues are day-to-day problems
> for system administrators, who struggle with them.

All the reasons to try to avoid them.

But going back to the point, if I'm going about this the wrong way,
what would be a better way?

Thanks for your input.


João