HTCondor Project List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] somewhat evil problem with work fetch and schedd claims. :(

Date: Wed, 6 Feb 2008 00:32:43 -0800
From: Derek Wright <wright@xxxxxxxxxxx>
Subject: Re: [Condor-devel] somewhat evil problem with work fetch and schedd claims. :(


On Feb 6, 2008, at 12:15 AM, Daniel Forrest wrote:

I won't claim to understand exactly what you're describing here, but
any problem which is exacerbated by short running jobs is a problem
that needs to be fixed.  The problem is that short running jobs also
include ill-behaved jobs (i.e. jobs that fail almost immediately from
some job related error), and if these jobs are set to retry on error
then you have an unintentional DoS attack on your pool.


Right, good point.  A few countervailing tendencies:

A) there's an expression you can define for how often the startd willfetch work. this is a classad expression evaluated in the context ofthe slot ad, so you can do quite fancy things if you wanted.

B) this is only an issue if you have a startd that is both trying tofetch work externally *and* you're expecting regular schedd claims toland there, too. while it's still early to tell, it seems like theonly viable use-cases for any of this startd work-fetching stuffinvolve fetching work exclusively, not a mix of both.

This is a real concern.  We have had several incidents of this type on
GLOW and it is an incredible PITA to have to first identify what is
going on and then disable the source of the bad jobs.


Agreed.

Please do not take option A.

I'd rather not, but it's not entirely up to me to decide. We'lldiscuss this tomorrow in a meeting anyway, so we should have ananswer soon enough.


Thanks for your input,
-Derek

References:
- [Condor-devel] somewhat evil problem with work fetch and schedd claims. :(
  - From: Derek Wright
- Re: [Condor-devel] somewhat evil problem with work fetch and schedd claims. :(
  - From: Daniel Forrest

Prev by Date: Re: [Condor-devel] somewhat evil problem with work fetch and schedd claims. :(
Next by Date: Re: [Condor-devel] somewhat evil problem with work fetch and schedd claims. :(
Previous by thread: Re: [Condor-devel] somewhat evil problem with work fetch and schedd claims. :(
Next by thread: Re: [Condor-devel] somewhat evil problem with work fetch and schedd claims. :(
Index(es):
- Date
- Thread