Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Black hole node

Date: Mon, 23 Jan 2006 16:08:33 -0600
From: Alain Roy <roy@xxxxxxxxxxx>
Subject: Re: [Condor-users] Black hole node

Hmm, the preferable solution would be if the central manager could flag
nodes that have cycled through say 10 jobs in the last 120seconds and
mark that node as bad. I was hoping that condor perhaps had some
functionality to deal with this situation.


The problem is that it's very hard to do this in general. For instance:

  * Although Condor isn't optimized for short-running jobs,
    it's not unusual for users to submit them.

  * Negotiation cycles are often long enough that a scheme like
    you describe won't happen even if there is a black hole.

  * There are lots black holes: machines that cause segfaults (how
    do you distinguish from a user job that just segfaults?),
    machines that cause jobs to run slowly (how do you distinguish
    from slow jobs?), and machines that cause jobs to exit quickly.

I agree that it's nice to have such a black hole system, but it'sdefinitely a challenge.


-alain

Follow-Ups:
- Re: [Condor-users] Black hole node
  - From: Horvatth Szabolcs
- Re: [Condor-users] Black hole node
  - From: Terrence Martin

References:
- [Condor-users] Black hole node
  - From: Terrence Martin
- Re: [Condor-users] Black hole node
  - From: Matt Hope
- Re: [Condor-users] Black hole node
  - From: Terrence Martin

Prev by Date: Re: [Condor-users] Black hole node
Next by Date: Re: [Condor-users] Strange toubles with Condor jobs submitted via globus
Previous by thread: Re: [Condor-users] Black hole node
Next by thread: Re: [Condor-users] Black hole node
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [Condor-users] Black hole node