Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Testing systems services

Date: Fri, 20 May 2011 12:21:38 +0100
From: Angel de Vicente <angelv@xxxxxx>
Subject: Re: [Condor-users] Testing systems services

Hi,

On 10/05/11 16:51, Burnett, Ben wrote:

Hi:

So I've been trying to manage a small Condor pool (~200 cores) over the last little while, and I've run into a small irritating issue, and wondered if others have experience the same thing, or if they have solutions/ideas.

So I have configured the pool to do various helpful things, like accept GPU jobs, provide dynamic slots on some of the more capable machines, etc.  What I have found though, is that once I've set the configuration, I rarely revisit it.  This means that if it stops working, I won't know until someone complains.  This might contribute to a decreased workload, since if no one complains, then it does not need to be fixed; however, it is more generally the case that I do get complaints, and generally they arrive in my inbox near strict deadlines (not that anyone ever leaves things to the last minute :P).

Does anyone have a relatively simple system to continuously test their pool's services?  Ideally, I'd like the test jobs to run with very low priority, so as not to interfere with regular workloads, but  would like them to run at least once a day (or as often as practically possible), and keep track of the results (this could just be an email, or a log file).  Then, if one job fails, I'd like to be emailed about it.

I can think of a few approaches myself, but I thought I'd ask if anyone has already got something similar up and running.

sorry to reply so late. Did you have a look at Hawkeye?http://www.cs.wisc.edu/condor/hawkeye/

It's been a while since I last used it, but it was very easy to use andout of the box you could check a lot of useful things like disk space,logged on users etc.


Cheers,
Ángel de vicente
--
http://www.iac.es/galeria/angelv/

High Performance Computing Support PostDoc
Instituto de Astrofísica de Canarias
---------------------------------------------------------------------------------------------
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en

Follow-Ups:
- Re: [Condor-users] Testing systems services
  - From: Burnett, Ben

Prev by Date: Re: [Condor-users] Dynamic Slots
Next by Date: [Condor-users] Condor Yum Repo RPM
Previous by thread: Re: [Condor-users] Dynamic Slots
Next by thread: Re: [Condor-users] Testing systems services
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [Condor-users] Testing systems services