[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Dynamic Partitioning Updating Memory Usage



By default job classad is sent from condor_starter daemon (worker machine)
to the condor_shadow (submit machine) every 5 minutes. I think that decreasing
value of variable STARTER_UPDATE_INTERVAL should help you. Details:

http://research.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#19089

You can also find an inspiration about job memory limitation on condor wiki page
https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToLimitMemoryUsage
Last section is about user jobs wrapper and "ulimit". Similar approach is
used id condor manual:
http://research.cs.wisc.edu/condor/manual/v7.6/3_13Setting_Up.html#SECTION0041313000000000000000

Regards,
Lukas

On Thursday, January 12, 2012 04:11 CET, Jake Adriaens <jtadriaens@xxxxxxxx> wrote:

> My previous configuration may have been wrong because I believe PREEMPTION_REQUIRMENTS is only evaluated when RemoteUserPrio > SubmittorPrio.  I've update the preemption configuration as reflected below.  Unfortunately jobs are still not preempted when ImageSize grows.  condor_q reflects the updated image size but I've discovered the ImageSize in the local job ad (in the file pointed to by $(_CONDOR_JOB_AD)) only updates when a job starts, but not while the job is running. I suspect this may be where the problem lies.  Is this functioning as intended?  Is there a way to force a job to vacate when ImageSize grows larger than the memory allocated to a dynamically partitioned slot?  Is there a more direct way to grown the memory allocated to a slot? Is there a problem with my configuration?
>
>
> Thanks!
> Jake
> New schedd configuration:
> WANT_SUSPEND=False
> WANT_VACATE=True
> START=True
> SUSPEND=False
> RAN_FOR_A_BIT = ($(ActivationTimer) > (10 * $(MINUTE)))
> WORKING_SET_SIZE_MB = (TARGET.ImageSize/1024)
> MEMORY_AVAILABLE_MB = (TARGET.Memory)MEMORY_EXCEEDED = ($(WORKING_SET_SIZE_MB) > $(MEMORY_AVAILABLE_MB))
> PREEMPT= ($(RAN_FOR_A_BIT) && $(MEMORY_EXCEEDED))
> CONTINUE= True
> KILL= ($(ActivityTimer) > $(MaxVacateTime))
> PREEMPTION_REQUIREMENTS = ( $(StateTimer) > (10 * $(MINUTE)) && RemoteUserPrio > SubmitterUserPrio * 1.2 )
>
>
>
> On 01/10/12, Jake Adriaens   wrote:
> > I've configured condor to use dynamic slot partitioning, however the memory size for the dynamic slots are never updated.  When my jobs run, each dynamic slot always has a memory size of 1 and never grows with the image size of the job. It is my understanding that once a dynamic slot is allocated the resources assigned to it cannot change.  To work around this I've tried to configure condor to preempt a job if its image size grows larger than the memory allocated to the slot.  However, my jobs are never preempted even though image size shown by condor_q is considerably larger than the amount of memory available in the dynamic slot as shown by condor_status.  Below are my preemption settings from the condor configuration. Any suggestions would be greatly appreciated.Jake
> > > > PREEMPT = True
> > RAN_FOR_A_BIT = $(StateTimer) > (10 * $(MINUTE))
> > PRIORITY_EXCEEDED = RemoteUserPrio > SubmittorPrio * 1.2
> > MEMORY_EXCEEDED = (TARGET.ImageSize/1024*0.7) > (Memory*1.0)
> > PREEMPTION_REQUIREMENTS = $(RAN_FOR_A_BIT) && ( $(PRIORITY_EXCEEDED) || $(MEMORY_EXCEEDED) )
> > > > > > > > > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>