Hmm, the preferable solution would be if the central manager could flag nodes that have cycled through say 10 jobs in the last 120seconds and mark that node as bad. I was hoping that condor perhaps had some functionality to deal with this situation.
The problem is that it's very hard to do this in general. For instance: * Although Condor isn't optimized for short-running jobs, it's not unusual for users to submit them. * Negotiation cycles are often long enough that a scheme like you describe won't happen even if there is a black hole. * There are lots black holes: machines that cause segfaults (how do you distinguish from a user job that just segfaults?), machines that cause jobs to run slowly (how do you distinguish from slow jobs?), and machines that cause jobs to exit quickly.I agree that it's nice to have such a black hole system, but it's definitely a challenge.
-alain