Robert E. Parrott wrote:
I'm also/instead looking for a solution to enforce memory limits at
runtime.
It looks as if a USER_JOB_WRAPPER with a ulimit line is the solution
here. Does that jibe with what others have done?
That is one option. Here are two others:
1. Have Condor preempt jobs from the machine when their virtual image
size exceeds some amount. Example:
MEMORY_EXCEEDED = ( ImageSize > 1.5*Memory*1024 )
MEMORY_NOT_EXCEEDED = ($(MEMORY_EXCEEDED) =!= TRUE)
WANT_SUSPEND = ($(WANT_SUSPEND)) && $(MEMORY_NOT_EXCEEDED)
PREEMPT = ($(PREEMPT)) && $(MEMORY_EXCEEDED)
2. Have Condor (on the submit side) put jobs on hold when their
virtual
image size exceeds some amount. It is a little more awkward to set
the
amount based on the size of the machine's memory in this case, but
it is
possible. Example:
# When a job matches, insert the machine memory into the
# job ClassAd so periodic_remove can refer to it.
MachineMemory = "$$(Memory)"
SUBMIT_EXPRS = $(SUBMIT_EXPRS) MachineMemory
SYSTEM_PERIODIC_HOLD = (MATCH_EXP_MachineMemory =!= UNDEFINED &&
ImageSize > 1.5*int(MATCH_EXP_MachineMemory))
Both of these techniques suffer from the shortcoming that they are
based
off of the virtual memory size of the job, which may not be an
accurate
measure of the job's actual demand on physical memory.
--Dan
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/