Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] SUSPEND/CONTINUE puzzle
- Date: Wed, 13 Jul 2011 07:45:54 -0700
- From: "Ralph&Maria Finch" <ralphmariafinch@xxxxxxxxx>
- Subject: Re: [Condor-users] SUSPEND/CONTINUE puzzle
Here are my suspend/continue expressions in condor_config.local. I
just upgraded to the latest version, 7.6.1, which did help properly
detect the keyboard, but it still has the 5 second cycle between
suspend and continue. This makes me suspect the problem lies in the
expression to suspend on high non-condor load.
HighLoad = 0.8
BackgroundLoad = 0.3
# time keyboard must be idle to start job
StartIdleTime = 5 * $(MINUTE)
# max time to allow a job in suspension
MaxSuspendTime = 2 * $(HOUR)
# if keyboard idle for this time, continue suspended job
ContinueIdleTime = 5 * $(MINUTE)
KeyboardBusy = (KeyboardIdle < $(StartIdleTime))
ConsoleBusy = (ConsoleIdle < $(StartIdleTime))
ConsoleNotBusy = ($(ConsoleBusy) == False)
KeyorConBusy = ($(KeyboardBusy) || $(ConsoleBusy))
KeyorConNotBusy = ($(KeyorConBusy) == False)
# Suspend job on Slots 1 or 2 if keyboard is touched
# or the Slot has a high non-condor load;
# but don't suspend if job suspension time exceeds limit
SUSPEND1 = (SlotID <= 2 && $(KeyorConBusy))
SUSPEND2 = ( $(NonCondorLoadAvg) > $(HighLoad) )
SUSPEND3 = ( (TotalJobSuspendTime =!= UNDEFINED) &&
(TotalJobSuspendTime <= $(MaxSuspendTime)) \
|| (TotalJobSuspendTime =?= UNDEFINED) )
SUSPEND = $(SUSPEND3) && ( $(SUSPEND1) || $(SUSPEND2) )
# continue on Slots1 & 2 if keyboard not used,
# or Slot's non-condor load drops,
# or job has been suspended more than than max suspend time
CONTINUE1 = (SlotID <= 2 && $(KeyorConNotBusy))
CONTINUE2 = (SlotID > 2 && $(NonCondorLoadAvg) <= $(BackgroundLoad))
CONTINUE3 = ((TotalJobSuspendTime =!= UNDEFINED) &&
(TotalJobSuspendTime > $(MaxSuspendTime)))
CONTINUE = $(CONTINUE3) || $(CONTINUE1) || $(CONTINUE2)
On Wed, Jul 13, 2011 at 4:41 AM, Matthew Farrellee <matt@xxxxxxxxxx> wrote:
>
> On 07/12/2011 07:35 PM, Ralph&Maria Finch wrote:
>>
>> condor -version
>> $CondorVersion: 7.5.3 Jun 24 2010 BuildID: 250654 $
>> $CondorPlatform: INTEL-WINNT50 $
>>
>> Given the Windows platform, I implement a SUSPEND policy. If the
>> keyboard is touched in the last 5 minutes, or if the non-Condor load
>> reaches a high value, I want to SUSPEND the job. Then CONTINUE the job
>> when the keyboard is untouched for 5 minutes and the load is below the
>> limit.
>>
>> Unfortunately I have something wrong and the jobs SUSPEND/CONTINUE every
>> 5 seconds:
>>
>> 07/12/11 16:32:21 slot1: Sent update to 1 collector(s)
>> 07/12/11 16:32:22 slot1: State change: SUSPEND is TRUE
>> 07/12/11 16:32:22 slot1: Changing activity: Busy -> Suspended
>> 07/12/11 16:32:22 slot1: In Starter::kill() with pid 5372, sig 100
>> (DC_SIGSUSPEND)
>> 07/12/11 16:32:23 slot1: Received job ClassAd update from starter.
>> 07/12/11 16:32:26 Trying to update collector <123.456.78.910:9618>
>> 07/12/11 16:32:26 Attempting to send update via UDP to collector
>> delta-mod.water.ca.gov <http://delta-mod.water.ca.gov> <123.456.78.910:9618>
>> 07/12/11 16:32:26 slot1: Sent update to 1 collector(s)
>> 07/12/11 16:32:27 slot1: State change: CONTINUE is TRUE
>> 07/12/11 16:32:27 slot1: In Starter::kill() with pid 5372, sig 101
>> (DC_SIGCONTINUE)
>> 07/12/11 16:32:27 slot1: Changing activity: Suspended -> Busy
>> 07/12/11 16:32:27 slot1: Received job ClassAd update from starter.
>>
>>
>> Attempting to debug this, I set
>>
>> STARTD_DEBUG = D_FULLDEBUG
>>
>> While this does give more information (see above), it doesn't state why
>> Condor decides to SUSPEND or CONTINUE a job. And that piece of
>> information I need to see what is wrong with my condition statement.
>> What can I do to see why Condor is changing the state of a job?
>>
>> Ralph Finch
>> Calif. Dept. of Water Resources
>> Sacramento, CA USA
>
> Please include your SUSPEND/CONTINUE expressions.
>
> You can try debug() around them, but it might have been gone by 7.5.3.
>
> Best,
>
>
> matt