Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] out-of-memory issues in parallel universe

Date: Wed, 19 Mar 2008 12:44:23 -0500
From: Dan Bradley <dan@xxxxxxxxxxxx>
Subject: Re: [Condor-users] out-of-memory issues in parallel universe




Robert E. Parrott wrote:

I'm also/instead looking for a solution to enforce memory limits atruntime.
It looks as if a USER_JOB_WRAPPER with a ulimit line is the solutionhere. Does that jibe with what others have done?


That is one option.  Here are two others:

1. Have Condor preempt jobs from the machine when their virtual imagesize exceeds some amount. Example:


MEMORY_EXCEEDED = ( ImageSize > 1.5*Memory*1024 )
MEMORY_NOT_EXCEEDED = ($(MEMORY_EXCEEDED) =!= TRUE)

WANT_SUSPEND = ($(WANT_SUSPEND)) && $(MEMORY_NOT_EXCEEDED)
PREEMPT = ($(PREEMPT)) && $(MEMORY_EXCEEDED)

2. Have Condor (on the submit side) put jobs on hold when their virtualimage size exceeds some amount. It is a little more awkward to set theamount based on the size of the machine's memory in this case, but it ispossible. Example:


# When a job matches, insert the machine memory into the
# job ClassAd so periodic_remove can refer to it.
MachineMemory = "$$(Memory)"
SUBMIT_EXPRS = $(SUBMIT_EXPRS)  MachineMemory

SYSTEM_PERIODIC_HOLD = (MATCH_EXP_MachineMemory =!= UNDEFINED &&ImageSize > 1.5*int(MATCH_EXP_MachineMemory))

Both of these techniques suffer from the shortcoming that they are basedoff of the virtual memory size of the job, which may not be an accuratemeasure of the job's actual demand on physical memory.


--Dan

Follow-Ups:
- Re: [Condor-users] out-of-memory issues in parallel universe
  - From: Robert E. Parrott

References:
- [Condor-users] Submit Parallel Job from the client Matlab to Condor scheduler!
  - From: Vinicius da Cunha M. Borges
- [Condor-users] out-of-memory issues in parallel universe
  - From: Robert E. Parrott
- Re: [Condor-users] out-of-memory issues in parallel universe
  - From: Greg Thain
- Re: [Condor-users] out-of-memory issues in parallel universe
  - From: Robert E. Parrott

Prev by Date: [Condor-users] condor_schedd slowness causing job leases to expire
Next by Date: Re: [Condor-users] out-of-memory issues in parallel universe
Previous by thread: Re: [Condor-users] out-of-memory issues in parallel universe
Next by thread: Re: [Condor-users] out-of-memory issues in parallel universe
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [Condor-users] out-of-memory issues in parallel universe