HTCondor Project List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-devel] schedd not always reusing claims when it should

Date: Mon, 30 Jan 2012 09:52:05 -0600
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: [Condor-devel] schedd not always reusing claims when it should

Just found a bug in the schedd that causes it to release claims when itstill has jobs it could run on those claims. Should be easy to fix, Iwill fix it this morning after consulting w/ the wrangler about if itshould go into v7.7.5 branch and/or into stable.

I was looking into the above cause I was getting some unexpected resultstesting my patch to create dynamic slots w/o a negotiation cycle. (theproblem is unrelated to my patch btw)


The story is:

1. a job id x completes.

2. schedd goes through priorec array to try and find another job thatmatches the claimed resource. UNFORTUNATELY, it may happen thata) the priorec has yet been rebuilt (i.e. using a cached priorecarray), andb) the job classad for complete job id x has not yet been destroyed,as it is waiting to be destroyed in the enqueueFinishedJob queue3. as a result of a and b, findrunnable job will try to match *the veryjob that just completed* with the now idle claim4. the job that just completed may no longer match with the claimedstartd ad, especially in the case of submitting jobs w/ the defaults andusing dynamic slots. For instance, machine.Memory < jobad.ImageSize,because ImageSize in the completed job is now bigger than when theresource was initially claimed (because now that the job is completed,ImageSize now reflects the real size seen by the starter and not justthe size of the executable on disk).5. findrunnable job now marks the entire autocluster as not matchingthis claim6. the claim is relinquished when Todd's test really expected it shouldbe reused :(






--
Todd Tannenbaum                       University of Wisconsin-Madison
Center for High Throughput Computing  Department of Computer Sciences
tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                 Madison, WI 53706-1685

Follow-Ups:
- Re: [Condor-devel] schedd not always reusing claims when it should
  - From: Matthew Farrellee
- Re: [Condor-devel] schedd not always reusing claims when it should
  - From: Dan Bradley

Prev by Date: [Condor-devel] 7.7.5 branched
Next by Date: Re: [Condor-devel] schedd not always reusing claims when it should
Previous by thread: [Condor-devel] 7.7.5 branched
Next by thread: Re: [Condor-devel] schedd not always reusing claims when it should
Index(es):
- Date
- Thread