Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Condor monitoring alternatives
- Date: Sat, 9 Feb 2008 09:40:22 -0600
- From: Nick LeRoy <nleroy@xxxxxxxxxxx>
- Subject: Re: [Condor-users] Condor monitoring alternatives
On Fri February 8 2008, Brent Strong wrote:
> Here at RIT, we've been working on building up a respectable Condor
> pool with some success. We're now running into the issue of
> monitoring our clients. We have enough client machines that it is now
> impossible to visually parse the condor_status output to find
> "stragglers", so I'm looking for an automated solution.
>
> I'm specifically looking for a lightweight alternative to hawkeye,
> possibly something we could integrate into or have as an addition to
> our quick stats look ( http://stats.rc.rit.edu/condor/ ).
>
> Has anyone written a simple script or similar that contains a master
> list of machines that should be up and compares the output of
> condor_status to it? It seems to be something that would be very
> useful and I'm hoping I can reuse someone else's code.
>
> Ideally, we want a lightweight webpage that shows a list of machines
> (by hostname, IP, whatever) that Condor is installed on and their
> corresponding status (up and running condor, up but not running/
> responding to condor, down). Combining the output of a ping test,
> condor_status and a master list of machines, these states should be
> easily determined. My question is: has anyone done this?
Ah, yes. You should look at the Condor Pool Tools
http://www.cs.wisc.edu/condor/tools/PoolTools/ and possibly Hawkeye. The
pools tools are a set of tools for doing just what you describe, or, at
least, the chunk of it that does the heavy lifting of knowing what the list
is, querying the collector, looking for differences, etc. The current
tarball that's out there is version 0.1.2, and is woefully out of date (I'm
the developer of these tools).
Hawkeye can run the pool tools periodically (indeed, there's a
Hawkeye "module" just for that, and other pool health operations (which is
run on our pool here at UW)).
Back to the pool tools, you probably really want to start with the latest
version - let's call it 0.2 - which can't be downloaded at the moment because
I haven't created a tarball of it. The main reason to start with it is that
I did some major changes to the syntax of it's configuration files, and it'd
seem foolish to have to rewrite them. Of course, I haven't finished updating
the documentation on it, yet, either. :(
So, if you're interested, send me an email, and I'll work with you one-on-one
to get you setup and running with them. :)
-Nick
--
<<< Why, oh, why, didn't I take the blue pill? >>>
/`-_ Nicholas R. LeRoy The Condor Project
{ }/ http://www.cs.wisc.edu/~nleroy http://www.cs.wisc.edu/condor
\ / nleroy@xxxxxxxxxxx The University of Wisconsin
|_*_| 608-265-5761 Department of Computer Sciences