Hmm, the preferable solution would be if the central manager could flag nodes that have cycled through say 10 jobs in the last 120seconds and mark that node as bad. I was hoping that condor perhaps had some functionality to deal with this situation.
The problem is that it's very hard to do this in general. For instance:
* Although Condor isn't optimized for short-running jobs,
it's not unusual for users to submit them.
* Negotiation cycles are often long enough that a scheme like
you describe won't happen even if there is a black hole.
* There are lots black holes: machines that cause segfaults (how
do you distinguish from a user job that just segfaults?),
machines that cause jobs to run slowly (how do you distinguish
from slow jobs?), and machines that cause jobs to exit quickly.
I agree that it's nice to have such a black hole system, but it's
definitely a challenge.
-alain