Re: [HTCondor-devel] resource allocation question and proposal


Date: Fri, 10 May 2013 09:50:53 -0500
From: "Todd Tannenbaum" <tannenba@xxxxxxxxxxx>
Subject: Re: [HTCondor-devel] resource allocation question and proposal
Hi Doug,

Recall that the starter does write out the claimed machine ad as ascii text into a file - the location of this file is inserted as an environment variable to the job.

Of course you can also explicitly pass cpu/memory or anything else from the machine ad via environment via use of $$() in the submit file.

Finally, at least for cpu cores and memory, the latest dev release of htcondor has nice mechanisms for enforcing the limits via Linux kernel cgroups and affinity support.


-- Sent from my HP Veer mobile phone


On May 10, 2013 8:33 AM, Douglas Thain <dthain@xxxxxx> wrote:

(Great to see everyone again at HTCondorWeek this year.)

We are encountering a growing number of situations where we need to
communicate allocation of resources down a tree of processes on the
same machine. Unless told otherwise, most programs simply look at the
number
of cores/memory/disk installed on a machine, and then attempt
to use everything simultaneously. Obviously, this doesn't work with N>1

As an example, we use Condor to deploy a Work Queue as a pilot job
system in order to run some multi-core jobs. The machine may have 16
cores, of which Condor gives us 8 in a slot, on which Work Queue may
want to run two x 4 core jobs simultaneously. We can set this all up
manually, but it would be better to simply communicate the resource
allocation down the chain.

So, first, a question:

- Does HTCondor communicate the properties of a slot to the job
running in that slot? e.g. You have been assigned 2 cores and 1GB
RAM, so please behave.

If not, then a modest proposal:

- Could we define a simple and common way of communicating intended
resource allocations from parent to child process? It might be as
simple as defining a few environment variables: CORES=4; MEMORY=8;
DISK=16

I am not concerned about enforcement (yet) but just simply
communicating the expected behavior to a child process. If we could
document a common way of doing this that even a few projects could
sign on to, it would help with these sort of problems immensely.

P.S. Yes, yes, I know about VMs/cgroups/etc but they are not
universally deployed and don't compose hierarchically.

- Doug
_______________________________________________
HTCondor-devel mailing list
HTCondor-devel@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel
[← Prev in Thread] Current Thread [Next in Thread→]