Hi Steve, Take a look at the WANT_HOLD documentation in condor: http://www.cs.wisc.edu/condor/manual/v7.5/3_3Configuration.html#SECTION004310000000000000000 It has a good example of a startd policy for holding jobs. You can easily modify this to just preempt, or whatever policy you would like. I believe the startd evaluates these attributes much more frequently than classad updates to schedd. -Derek On Jun 2, 2011, at 9:15 AM, Steven Timm wrote: > > In my cluster I have been using a schedd-based method of > killing jobs that are using too much memory. > > [root@fcdf1x1 local]# condor_config_val SYSTEM_PERIODIC_REMOVE > (NumJobStarts > 10) || (ImageSize>=2500000) || (JobRunCount>=1 && JobStatus==1 && ImageSize>=1000000) > > But this has two weaknesses > > One is that sometimes it can take > the shadow a long time to send the high memory value back to > the schedd so the schedd can act, and in the meantime the job grows > too fast and sucks up all ram on the node and starts killing other > processes. > > The second one is that I have a diverse pool of nodes and > would like jobs running on the nodes with bigger memory to use it if > it is there. > > So is there a way to evict jobs that use, (ImageSize*2>Memory)? > would you use the KILL or the PREEMPT function? > > Steve Timm > > > > -- > ------------------------------------------------------------------ > Steven C. Timm, Ph.D (630) 840-8525 > timm@xxxxxxxx http://home.fnal.gov/~timm/ > Fermilab Computing Division, Scientific Computing Facilities, > Grid Facilities Department, FermiGrid Services Group, Group Leader. > Lead of FermiCloud project. > _______________________________________________ > Condor-users mailing list > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/condor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/condor-users/
Attachment:
smime.p7s
Description: S/MIME cryptographic signature