Hi Doug.
While modifying the starter could be an option (longer term),
you don't need to go to such extremes.
A simple wrapper script that reads out the classad file and puts the necessary values in the environment
is almost trivial to write and will work with basically all the Condor versions out there.
You just need to do a little bit of config magic to get there.
Igor
On 05/10/2013 10:00 AM, Douglas Thain wrote:
Ok, Condor providing the machine ad is a good first step, but part of
the goal here is interop between apps and other resource managers that
may not know about ClassAds. We want the pieces to work together
whether it is Condor->app or SGE -> Glide-In -> Panda -> app
Suppose that we wrote up a short 1-page spec of how to communicate
resource allocations decisions from parent to child, along the lines
of CORES=x, MEMORY=y, DISK=z, and then encourage various stakeholders
to produce/consume the spec.
Does anyone have a major objection to modifying the starter to produce
that environment?
What about modifying the startd to look for that environment and
configure itself accordingly? (Would be nice for glideins.)
On Fri, May 10, 2013 at 10:50 AM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
Hi Doug,
Recall that the starter does write out the claimed machine ad as ascii text
into a file - the location of this file is inserted as an environment
variable to the job.
Of course you can also explicitly pass cpu/memory or anything else from the
machine ad via environment via use of $$() in the submit file.
Finally, at least for cpu cores and memory, the latest dev release of
htcondor has nice mechanisms for enforcing the limits via Linux kernel
cgroups and affinity support.
-- Sent from my HP Veer mobile phone
________________________________
On May 10, 2013 8:33 AM, Douglas Thain <dthain@xxxxxx> wrote:
(Great to see everyone again at HTCondorWeek this year.)
We are encountering a growing number of situations where we need to
communicate allocation of resources down a tree of processes on the
same machine. Unless told otherwise, most programs simply look at the
number
of cores/memory/disk installed on a machine, and then attempt
to use everything simultaneously. Obviously, this doesn't work with N>1
As an example, we use Condor to deploy a Work Queue as a pilot job
system in order to run some multi-core jobs. The machine may have 16
cores, of which Condor gives us 8 in a slot, on which Work Queue may
want to run two x 4 core jobs simultaneously. We can set this all up
manually, but it would be better to simply communicate the resource
allocation down the chain.
So, first, a question:
- Does HTCondor communicate the properties of a slot to the job
running in that slot? e.g. You have been assigned 2 cores and 1GB
RAM, so please behave.
If not, then a modest proposal:
- Could we define a simple and common way of communicating intended
resource allocations from parent to child process? It might be as
simple as defining a few environment variables: CORES=4; MEMORY=8;
DISK=16
I am not concerned about enforcement (yet) but just simply
communicating the expected behavior to a child process. If we could
document a common way of doing this that even a few projects could
sign on to, it would help with these sort of problems immensely.
P.S. Yes, yes, I know about VMs/cgroups/etc but they are not
universally deployed and don't compose hierarchically.
- Doug
_______________________________________________
HTCondor-devel mailing list
HTCondor-devel@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel
_______________________________________________
HTCondor-devel mailing list
HTCondor-devel@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel
_______________________________________________
HTCondor-devel mailing list
HTCondor-devel@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel
|