Re: [HTCondor-devel] resource allocation question and proposal


Date: Fri, 10 May 2013 12:34:51 -0500
From: Erik Paulson <epaulson@xxxxxxxxxxxx>
Subject: Re: [HTCondor-devel] resource allocation question and proposal
Is there any existing convention that could or should be followed? It might be nice to use environment variables with the same name and meanings as other things. I could imagine memory might have one from some ancient POSIX standard, though I'd be surprised if cores did. 

Otherwise, it seems sensible to me, at least for the short term (where "short" invariably winds up meaning forever)

This probably creates more trouble than it's worth, but maybe also put a version of ulimit that knows how to respect the resource manager limits? It might make debugging strange for people, when shell scripts return different numbers than library/system calls. 

Does the generic environment variable for the startd take precedence over the config file? 

-Erik


On Fri, May 10, 2013 at 12:00 PM, Douglas Thain <dthain@xxxxxx> wrote:
Ok, Condor providing the machine ad is a good first step, but part of
the goal here is interop between apps and other resource managers that
may not know about ClassAds.  We want the pieces to work together
whether it is Condor->app or SGE -> Glide-In -> Panda -> app

Suppose that we wrote up a short 1-page spec of how to communicate
resource allocations decisions from parent to child, along the lines
of CORES=x, MEMORY=y, DISK=z, and then encourage various stakeholders
to produce/consume the spec.

Does anyone have a major objection to modifying the starter to produce
that environment?

What about modifying the startd to look for that environment and
configure itself accordingly?  (Would be nice for glideins.)



On Fri, May 10, 2013 at 10:50 AM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
> Hi Doug,
>
> Recall that the starter does write out the claimed machine ad as ascii text
> into a file - the location of this file is inserted as an environment
> variable to the job.
>
> Of course you can also explicitly pass cpu/memory or anything else from the
> machine ad via environment via use of $$() in the submit file.
>
> Finally, at least for cpu cores and memory, the latest dev release of
> htcondor has nice mechanisms for enforcing the limits via Linux kernel
> cgroups and affinity support.
>
>
> -- Sent from my HP Veer mobile phone
>
> ________________________________
> On May 10, 2013 8:33 AM, Douglas Thain <dthain@xxxxxx> wrote:
>
> (Great to see everyone again at HTCondorWeek this year.)
>
> We are encountering a growing number of situations where we need to
> communicate allocation of resources down a tree of processes on the
> same machine. Unless told otherwise, most programs simply look at the
> number
> of cores/memory/disk installed on a machine, and then attempt
> to use everything simultaneously. Obviously, this doesn't work with N>1
>
> As an example, we use Condor to deploy a Work Queue as a pilot job
> system in order to run some multi-core jobs. The machine may have 16
> cores, of which Condor gives us 8 in a slot, on which Work Queue may
> want to run two x 4 core jobs simultaneously. We can set this all up
> manually, but it would be better to simply communicate the resource
> allocation down the chain.
>
> So, first, a question:
>
> - Does HTCondor communicate the properties of a slot to the job
> running in that slot? e.g. You have been assigned 2 cores and 1GB
> RAM, so please behave.
>
> If not, then a modest proposal:
>
> - Could we define a simple and common way of communicating intended
> resource allocations from parent to child process? It might be as
> simple as defining a few environment variables: CORES=4; MEMORY=8;
> DISK=16
>
> I am not concerned about enforcement (yet) but just simply
> communicating the expected behavior to a child process. If we could
> document a common way of doing this that even a few projects could
> sign on to, it would help with these sort of problems immensely.
>
> P.S. Yes, yes, I know about VMs/cgroups/etc but they are not
> universally deployed and don't compose hierarchically.
>
> - Doug
> _______________________________________________
> HTCondor-devel mailing list
> HTCondor-devel@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel
>
> _______________________________________________
> HTCondor-devel mailing list
> HTCondor-devel@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel
_______________________________________________
HTCondor-devel mailing list
HTCondor-devel@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel

[← Prev in Thread] Current Thread [Next in Thread→]