[HTCondor-devel] resource allocation question and proposal


Date: Fri, 10 May 2013 09:30:42 -0400
From: Douglas Thain <dthain@xxxxxx>
Subject: [HTCondor-devel] resource allocation question and proposal
(Great to see everyone again at HTCondorWeek this year.)

We are encountering a growing number of situations where we need to
communicate allocation of resources down a tree of processes on the
same machine.  Unless told otherwise, most programs simply look at the
number
of cores/memory/disk installed on a machine, and then attempt
to use everything simultaneously.  Obviously, this doesn't work with N>1

As an example, we use Condor to deploy a Work Queue as a pilot job
system in order to run some multi-core jobs.  The machine may have 16
cores, of which Condor gives us 8 in a slot, on which Work Queue may
want to run two x 4 core jobs simultaneously.  We can set this all up
manually, but it would be better to simply communicate the resource
allocation down the chain.

So, first, a question:

- Does HTCondor communicate the properties of a slot to the job
running in that slot?  e.g. You have been assigned 2 cores and 1GB
RAM, so please behave.

If not, then a modest proposal:

- Could we define a simple and common way of communicating intended
resource allocations from parent to child process?  It might be as
simple as defining a few environment variables: CORES=4; MEMORY=8;
DISK=16

I am not concerned about enforcement (yet) but just simply
communicating the expected behavior to a child process.  If we could
document a common way of doing this that even a few projects could
sign on to, it would help with these sort of problems immensely.

P.S. Yes, yes, I know about VMs/cgroups/etc but they are not
universally deployed and don't compose hierarchically.

- Doug
[← Prev in Thread] Current Thread [Next in Thread→]