[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] [Filter Test: P272621] Re: [Filter Test: P272621] Startd's crashing with fatal error getting process info for starter and descendants



>> 7/5 14:14:45 ERROR "Starter::percentCpuUsage(): Fatal error getting
>> process info for the starter and decendents" at line 859 in file
>> ..\src\condor_startd.V6\Starter.C
>
> I've figure out the problem. One of the jobs run by one of my users is
> using the Windows API call:
>
> SetPriorityClass(GetCurrentProcess(), HIGH_PRIORITY_CLASS);
> 
> On one of it's spawned threads and the result is the condor_startd
gets
> starved for CPU on the machine (it's a single processor machine).
>
> The user was trying to reduce the variance in run time of his job from
> run to run by preventing this critical thread from being interrupted.
>
> First thing I've noticed:
>
> All the condor daemons on Windows run at 'Normal' priority. Would it
be
> possible to add a config setting that would let me change this? I'd
like
> to see all the daemons run at 'High' priority.
>
> Second thing:
>
> I haven't tried out Matt Hope's suggestion of JOB_RENICE_INCREMENT=0
but
> it would be really nice to have an explicit way, given the daemons are
> running at 'High' priority, to spawn the job thread at 'Normal'
> priority. This would ensure far fewer interruptions from system
process
> on the machine.

Add a third request to this:

The startd shouldn't die like this when it can't gather process stats.
That doesn't really seem like an I-should-die-and-take-down-my-jobs kind
of situation. Warnings are fine.

- Ian