Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Reserving RAM dynamically
- Date: Mon, 16 Nov 2015 12:28:42 +0000
- From: Brian Candler <b.candler@xxxxxxxxx>
- Subject: [HTCondor-users] Reserving RAM dynamically
Outline question: is there a way to dynamically reserve some resource
(in particular RAM) without actually running a job?
Here's the actual scenario. We are using partionable slots. We have a
bunch of jobs which run, where all of them open a very large mmap()
read-only database. This uses X GB of memory, but due to sharing, only
one chunk of X GB is used regardless of how many jobs have that database
open at once.
I'm trying to work out how best to handle this in the condor world.
If we add X to the request_memory of each job, then we run only a
fraction of the number of jobs which ought to be able to run concurrently.
If we don't take account of X at all, and each job just declares its
"other" memory usage, then we end up with out-of-memory conditions when
many instances are running.
If we guess that (say) 20 instances of a job will run at once, then we
could add X/20 to the declared memory usage of one job. However this
doesn't actually work: given a mixture of different jobs, there might
actually be only 1 instance of this job plus 19 instances of other jobs
which don't use the database. Again, we run out of memory.
The current approach we are using is to statically subtract X from the
total amount of memory available on the server:
MEMORY=($(DETECTED_MEMORY)-xxxxx)
However this is less than ideal because:
1. When we are running jobs which don't use the database, fewer jobs are
able to run than otherwise could
2. It won't work in future when we start using multiple databases
concurrently, e.g. some jobs open shared database X and some open shared
database Y.
3. The size of databases X and Y will grow over time, and we'd rather
not reconfigure and restart condor.
So I am wondering if it is possible to do something like:
- before we run any jobs which open X, we would reduce the available
memory on that node by X
- and once we no longer have any jobs which use X, we release it
One way to do that would be to run a dummy job which declares that it
uses X amount of memory but does nothing. We could also announce a
machine classAd attribute saying that database X is available. This
would have to be an infinite-running job, and we would have to kill it
when we want to remove the memory reservation for X. This sounds messy.
I wonder if there is a more direct way to make a claim on a resource,
without actually having to run a job?
I suppose our ideal solution would be to have direct support for shared
RAM resources. For example, a job might declare:
request_memory = 1000
request_shared = FOO:8000
If there is no other job which is currently using FOO, then the
available memory pool would be reduced by 8000 before this job starts
(perhaps by creating a dummy partitioned slot taking 8000); this clearly
has an impact on matchmaking. And when all jobs using FOO terminate, the
dummy partitioned slot would be dropped.
Any clues for how to deal with this?
Thanks,
Brian Candler.