Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [condor-users] condor_shadow timeout when loosing contact withstartd
- Date: 26 Jan 2004 16:26:16 -0600
- From: Geoff Lovett <geoff.lovett@xxxxxxxxxxxxxxxxxxx>
- Subject: Re: [condor-users] condor_shadow timeout when loosing contact withstartd
Well, basically I'd like to use condor in a semi real-time application.
So I'd like to get the two hours condor takes to requeue a job onto a
new box when there's a failure down to maybe 20 minutes. To reproduce
the 2 hour timeout behaviour, I'm simply running a job then turning off
the execute box (to simulate a crash).
Indeed, the STARTER_UPDATE_INTERVAL hasn't decreased the timeout.
--Geoff
On Mon, 2004-01-26 at 16:14, Zachary Miller wrote:
> On Mon, Jan 26, 2004 at 02:55:44PM -0600, Geoff Lovett wrote:
> > Ah, ok :) I'm trying to replicate the problem, and so far, 20 minutes
> > into it, it's still hung. I'll use STARTER_UPDATE_INTERVAL instead of
> > SHADOW_UPDATE_INTERVAL and give it a shot.
>
> i don't think this is going to fix your root problem though. this update
> interval simply controls how often the job stats (memory usage, run time,
> etc.) get updated from the starter to the shadow.
>
> what is actually happening in your case? is the starter getting killed,
> is hung, or something else?
>
>
> cheers,
> -zach
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>