[HTCondor-devel] Fwd: Openstack + HTCondor


Date: Mon, 1 Jul 2013 11:20:50 -0400 (EDT)
From: Tim St Clair <tstclair@xxxxxxxxxx>
Subject: [HTCondor-devel] Fwd: Openstack + HTCondor
Hmmmm.... 

It seems silly not to give this some attention, as it is an excellent use case imho. 

Cheers,
Tim


From: "Derrick H. Karimi" <dhkarimi@xxxxxxxxxxx>
To: "Tim St Clair" <tstclair@xxxxxxxxxx>
Cc: "Todd Tannenbaum" <tannenba@xxxxxxxxxxx>, "Todd L Miller" <tlmiller@xxxxxxxxxxx>, "Matthew Farrellee" <matt@xxxxxxxxxx>, "Erik Erlandson" <eje@xxxxxxxxxx>, "Jonathan Chu" <jchu@xxxxxxxxxxx>, "Naomi M. Anderson" <nmanderson@xxxxxxxxxxx>
Sent: Monday, July 1, 2013 9:48:31 AM
Subject: Re: Openstack + HTCondor

Hi,

We did not follow up at UW.  We completed our analysis of the area and found some complexities in  our initial idea of a 3rd party getting Condor to decide where to tell Openstack to run workloads.  While there were many ways this could be done none of them seemed particularly clean. 

Architecturally both condor and nova-scheduler seem to want to own the lifetime management of the workload, and complexity ensues when tying the two systems together and ensuring consistent state.  Perhaps the Condor developers on this email could correct me in this assumption, but through the normal user extensibility mechanisms (API, hooks, jobs... etc) we didn't see a clean way to accomplish our requirements.

We did look at the Maui scheduler briefly, and it looked like if we used this, there may be a chance of meeting our goal of quickly adding high maturity scheduling features to OpenStack.

We also took a look at some scheduler related stuff in the OpenStack Roadmap (Ceilometer, Heat, Convection)

Our primary reason for investigating this has since changed our tasking, but we remain interested in this area and would like to help all we can.  We are current RedHat, OpenStack, and Condor users. 

P.S. Todd might not remember, but we played fussball one night CondorWeek 2010.

On 07/01/2013 09:47 AM, Tim St Clair wrote:
Hi Derrick - 

I figured I would follow up with you to see if anyone from UW has been in contact yet. 

Cheers,
Tim

----- Original Message -----
From: "Derrick H. Karimi" <dhkarimi@xxxxxxxxxxx>
To: "Tim St Clair" <tstclair@xxxxxxxxxx>
Cc: "Matthew Farrellee" <matt@xxxxxxxxxx>, "Erik Erlandson" <eje@xxxxxxxxxx>, "Jonathan Chu" <jchu@xxxxxxxxxxx>,
"Naomi M. Anderson" <nmanderson@xxxxxxxxxxx>
Sent: Wednesday, June 5, 2013 7:30:00 AM
Subject: Re: Openstack + HTCondor



On 06/04/2013 11:03 PM, Tim St Clair wrote:
Derrick -

The openstack integration leverages the ec2 compatibility layer, so from
the end user perspective you can use all the same submission parameters
but change the url of where you are submitting to.

May also be of interest to you.
Co-Scheduling:
https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3622

Feel free to ask any questions or hit condor_users list too.

Cheers,
Tim
MRG-Grid Team

----- Original Message -----
From: "Erik Erlandson"<eje@xxxxxxxxxx>
To: "Derrick H. Karimi"<dhkarimi@xxxxxxxxxxx>
Cc: "Tim St Clair"<tstclair@xxxxxxxxxx>, "Matthew
Farrellee"<matt@xxxxxxxxxx>
Sent: Tuesday, June 4, 2013 6:27:13 PM
Subject: Re: Openstack + HTCondor



----- Original Message -----
Hi,

Your name came up when I was researching OpenStack and Condor.  My team
is interested in levering the features in Condor's
scheduling/matchmaking mechanisms in OpenStack's mechanisms for placing
virtual machines.

I have seen some scare references on the web to using Condor's ec2 grid
universe to drive openstack.  But I can't find any concrete information
on it.  Could you provide me with any documentation or share any
information with me?
Hi Derrick,

A good place to start with Condor and EC2 is here:
http://research.cs.wisc.edu/htcondor/manual/v7.8/5_3Grid_Universe.html#47430
http://spinningmatt.wordpress.com/2011/10/31/getting-started-condor-and-ec2-starting-and-managing-instances/

Tim and Matt (copied on this email) will also be able to help you and
point
you at resources

My teammates and I have also discussed the idea of applying condor's
scheduling and matchmaking algorithms to OpenStack.  There are different
possible approaches to that -- can you describe more about what you are
thinking?

Similarly "using Condor's ec2 grid universe to drive openstack" might mean
different things.  Are you thinking in terms of something like "elastic
Condor on OpenStack" or "managing OpenStack via Condor" or something
different?

I believe we are thinking along the same lines.  We want to have the
features of condor scheduling/matchmaking/priorities/preemption/fair
share/... integrated into openstack scheduler, so openstack can make
better decisions about where to place virtual machines.

More generally we wanted to investigate creating some form of scheduling
layer in openstack, such that we could theoretically integrate any
existing opensource schedulers (condor, slurm, torque, maui, etc)

I have several years experience with Condor, so condor was one of the
first tools we began investigating.

One of our goals is to maintain the use of the nova-api front end to
OpenStack.  We don't think we should try to change the way people submit
work to openstack.  We will likely find some plugin point in nova api
where we can intercept calls and translate these to talk to another
scheduler.

We are also considering uses of Openstack where the user has some idea
of what host his VM should run on, for example, a host with 10 gig
ethernet, or a host with an FPGA.  I was thinking of using a classads on
the compute nodes so hosts could advertise what they had.  I have read
that the Openstack scheduler has similar concept to class ads with the
filter/cost/weight/tag/hint concepts.  But logically I think the
robustness and maturity of existing schedulers should be better than
reinventing a wheel from scratch in python.

I have looked through the condor manual and some examples on the grid
ec2 universe.  From what I can tell the submit machine is the one that
will actually call through to the ec2 url.  Does this mean that if using
this you won't actually need any condor compute (startd) nodes?  How
does this manage resources, like do we assume in the ec2 grid there are
an infinite number of slots/disk/memory?

Theoretically, if the ec2 grid type did not exist, I could just submit
to condor a vanilla job with "euca-create-instance" matched in the
normal manor, such that slots/memory/disk would be reference counted.
What I am getting at is if condor was managing the openstack cluster we
can better schedule workloads.

But we also are looking at other OSS schedulers, because ideally we will
want our solution to be scheduler agnostic.  Our early research is to
determine if this goal is even possible.

The RFE Tim linked seems roughly similar to what my team has been thinking.

--
--Derrick H. Karimi
--Software Developer, SEI Emerging Technology Center
--Carnegie Mellon University


-- 
--Derrick H. Karimi
--Software Developer, SEI Emerging Technology Center
--Carnegie Mellon University

[← Prev in Thread] Current Thread [Next in Thread→]