(Great to see everyone again at HTCondorWeek this year.)
We are encountering a growing number of situations where we need to
communicate allocation of resources down a tree of processes on the
same machine. Unless told otherwise, most programs simply look at the
number
of cores/memory/disk installed on a machine, and then attempt
to use everything simultaneously. Obviously, this doesn't work with N>1
As an example, we use Condor to deploy a Work Queue as a pilot job
system in order to run some multi-core jobs. The machine may have 16
cores, of which Condor gives us 8 in a slot, on which Work Queue may
want to run two x 4 core jobs simultaneously. We can set this all up
manually, but it would be better to simply communicate the resource
allocation down the chain.
So, first, a question:
- Does HTCondor communicate the properties of a slot to the job
running in that slot? e.g. You have been assigned 2 cores and 1GB
RAM, so please behave.
If not, then a modest proposal:
- Could we define a simple and common way of communicating intended
resource allocations from parent to child process? It might be as
simple as defining a few environment variables: CORES=4; MEMORY=8;
DISK=16
I am not concerned about enforcement (yet) but just simply
communicating the expected behavior to a child process. If we could
document a common way of doing this that even a few projects could
sign on to, it would help with these sort of problems immensely.
P.S. Yes, yes, I know about VMs/cgroups/etc but they are not
universally deployed and don't compose hierarchically.
- Doug
|