Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] out-of-memory issues in parallel universe
- Date: Mon, 17 Mar 2008 10:36:28 -0500
- From: Greg Thain <gthain@xxxxxxxxxxx>
- Subject: Re: [Condor-users] out-of-memory issues in parallel universe
Is there some way of specifying the image size, and restricting jobs
to larger memory compute nodes, for MPI jobs submitted in the parallel
universe?
By default, Condor tries to run jobs only on machines that have enough
memory. Condor_submit does this by sticking the clause:
((Memory * 1024) >= ImageSize)
into the job's requirements. The problem is that Condor doesn't know a
priori how much memory the job will need (the ImageSize). So, it makes
an initial guess based on the size of the executable. This guess is
almost always wrong, almost always too small. If you have a better
guess as to the image size, you can put it in the submit file:
image_size = some_value_in_kbytes
And Condor will only match the job to machines (or slots) with at least
that amount of memory.
-greg