Re: [HTCondor-devel] resource allocation question and proposal


Date: Fri, 10 May 2013 07:06:38 -0700
From: Igor Sfiligoi <sfiligoi@xxxxxxxx>
Subject: Re: [HTCondor-devel] resource allocation question and proposal
Hi Doug et al.

Just FYI:
We have a similar problem in OSG/glideinWMS.

The current working idea is to add a wrapper to the startd that extracts the needed info from condor ad on disk,
and propagate it into the user environment.
But have not implemented it yet.
(even though it does not seem to hard ;) )

My 2c,
  Igor

On 05/10/2013 06:30 AM, Douglas Thain wrote:
(Great to see everyone again at HTCondorWeek this year.)

We are encountering a growing number of situations where we need to
communicate allocation of resources down a tree of processes on the
same machine.  Unless told otherwise, most programs simply look at the
number
of cores/memory/disk installed on a machine, and then attempt
to use everything simultaneously.  Obviously, this doesn't work with N>1

As an example, we use Condor to deploy a Work Queue as a pilot job
system in order to run some multi-core jobs.  The machine may have 16
cores, of which Condor gives us 8 in a slot, on which Work Queue may
want to run two x 4 core jobs simultaneously.  We can set this all up
manually, but it would be better to simply communicate the resource
allocation down the chain.

So, first, a question:

- Does HTCondor communicate the properties of a slot to the job
running in that slot?  e.g. You have been assigned 2 cores and 1GB
RAM, so please behave.

If not, then a modest proposal:

- Could we define a simple and common way of communicating intended
resource allocations from parent to child process?  It might be as
simple as defining a few environment variables: CORES=4; MEMORY=8;
DISK=16

I am not concerned about enforcement (yet) but just simply
communicating the expected behavior to a child process.  If we could
document a common way of doing this that even a few projects could
sign on to, it would help with these sort of problems immensely.

P.S. Yes, yes, I know about VMs/cgroups/etc but they are not
universally deployed and don't compose hierarchically.

- Doug
_______________________________________________
HTCondor-devel mailing list
HTCondor-devel@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel


[← Prev in Thread] Current Thread [Next in Thread→]