Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] [Filter Test: P272621] Re: [Filter Test:P272621] Startd's crashing with fatal error gettingprocess info for starter and descendants
- Date: Wed, 5 Jul 2006 16:10:55 -0400
- From: "Ian Chesal" <ICHESAL@xxxxxxxxxx>
- Subject: Re: [Condor-users] [Filter Test: P272621] Re: [Filter Test:P272621] Startd's crashing with fatal error gettingprocess info for starter and descendants
On Wed, Jul 05, 2006 at 04:01:12PM -0400, Ian Chesal wrote:
>>
>> Add a third request to this:
>>
>> The startd shouldn't die like this when it can't gather process
stats.
>> That doesn't really seem like an I-should-die-and-take-down-my-jobs
kind
>> of situation. Warnings are fine.
>
> Really? What if the user job is violating the local policy
expressions,
> say because it's driving load up very high and using all of the
memory.
>
> If a daemon doesn't know what it's doing, and if there's a chance that
> what it is doing is causing harm, it seems like the safe thing to do
is
> to shut down.
Ff there is a policy to enforce then asserting is a reasonable action,
but in this case there's no such policy. It doesn't bother me if the
daemons are getting starved because a job is running at really high
priority. It's a dedicated compute node. If it were controllable via
policy that'd be great.
- Ian