Todd Tannenbaum wrote:
> Jonathan D. Proulx wrote:
>> Hi,
>>
>> it appears as though a condor job in my flock ehausted the memory on a
>> user workstation over night ($CondorVersion: 6.8.2 Oct 12 2006 $
>> $CondorPlatform: X86_64-LINUX_RHEL3 $). This triggered Linux's OOM
>> killer which killed several desktop apps and sshd on the system.
>>
>> this seems a prety serious violation of the do no harm principle and
>> I'm a bit surprized.
>>
>> is the a config setting I need to tweak on the workstations?
>>
>
> You could add a clause to your preempt _expression_, i.e. something like
>
> PREEMPT = ( whatever was there before ) || (ImageSize > (Memory-20))
>
> This should work at least in the v6.9 series (and maybe in v6.8 as well?
> cannot recall offhand), since the condor_startd's value for "ImageSize"
> will be updated several times a minute to the total memory usage of the
> job.
Ouch, in my example above, I didn't deal with the fact that Memory is in
megs and ImageSize is in Kbytes. So something like the following is
what I meant:
PREEMPT = (whatever was there before) || \
(ImageSize > ((Memory*0.8)*1024) )
regards,
Todd
--
Todd Tannenbaum University of Wisconsin-Madison
Condor Project Research Department of Computer Sciences
tannenba@xxxxxxxxxxx 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/