Jonathan D. Proulx wrote:
Hi,
it appears as though a condor job in my flock ehausted the memory on a
user workstation over night ($CondorVersion: 6.8.2 Oct 12 2006 $
$CondorPlatform: X86_64-LINUX_RHEL3 $). This triggered Linux's OOM
killer which killed several desktop apps and sshd on the system.
this seems a prety serious violation of the do no harm principle and
I'm a bit surprized.
is the a config setting I need to tweak on the workstations?
You could add a clause to your preempt expression, i.e. something like
PREEMPT = ( whatever was there before ) || (ImageSize > (Memory-20))
This should work at least in the v6.9 series (and maybe in v6.8 as well?
cannot recall offhand), since the condor_startd's value for "ImageSize"
will be updated several times a minute to the total memory usage of the
job.