Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] PERIODIC_HOLD is applied extremely infrequently
- Date: Mon, 11 May 2015 15:23:45 -0500
- From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] PERIODIC_HOLD is applied extremely infrequently
> On May 11, 2015, at 11:18 AM, Vladimir Brik <vladimir.brik@xxxxxxxxxxxxxxxx> wrote:
>
> I added D_FULLDEBUG and "Evaluated periodic expressions" lines appear is SchedLog as expected. For example:
> Evaluated periodic expressions in 0.301s, scheduling next run in 60s
>
> My periodic hold expression is defined like this:
> rss_max = 6000
> mem_hold = ((isUndefined(ResidentSetSize_RAW) =?= False && isUndefined(RequestMemory) =?= False && ResidentSetSize_RAW/1000 > RequestMemory \
> && ResidentSetSize_RAW/1000 > 6000) =?= True)
> SYSTEM_PERIODIC_HOLD = ((JobStatus == 2 && JobUniverse == 5 && $(mem_hold) && isUndefined(RemoteHost) =?= False && regex("gzk9000c", RemoteHost) =!= True) =?= True)
>
> For testing, I tried using this:
> SYSTEM_PERIODIC_HOLD = (JobStatus == 2 && JobUniverse == 5 && Owner == "vbrik")
>
Hi Vlad,
Try adding:
SYSTEM_PERIODIC_HOLD = debug( $(SYSTEM_PERIODIC_HOLD) )
This will have HTCondor log the expression evaluation into ScheddLog, perhaps illuminating what is going on here!
Brian
> The interesting thing about the expression above is that it puts *some* jobs on hold immediately after they start running (as expected), but jobs that weren't put on hold immediately after starting are never put on hold.
>
> While debugging, I am also using this:
> PERIODIC_EXPR_INTERVAL = 60
> MAX_PERIODIC_EXPR_INTERVAL = 300
> PERIODIC_EXPR_TIMESLICE = .9
>
>
> Vlad
>
>
>
> On 05/08/15 15:51, Ben Cotton wrote:
>> Vlad,
>>
>> You should see lines like:
>>
>> 05/08/15 16:45:51 (pid:2968) Evaluated periodic expressions in 0.000s,
>> scheduling next run in 300s
>>
>> in your sched log (assuming SCHEDD_DEBUG includes D_FULLDEBUG). If you
>> see that at the expected interval (based on your
>> PERIODIC_EXPR_INTERVAL setting) then it's probably a problem in your
>> SYSTEM_PERIODIC_HOLD expression. Could you share that? If it doesn't
>> show up at the expected time, we'll have to try something else.
>>
>>
>> Thanks,
>> BC
>>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/