Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Queue problems
- Date: Tue, 18 Apr 2006 14:07:47 -0500
- From: Andy Wettstein <ajw@xxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Queue problems
On Tue, Apr 18, 2006 at 01:12:30PM -0500, Todd Tannenbaum wrote:
> At 10:15 AM 4/18/2006, Andy Wettstein wrote:
> >Hi
> >
> >We're running condor 6.7.18 and have noticed a problem when we add a
> >machine requirement to the submit file. We have a submit file like
> >this:
> >
> >Executable = hello.sh
> >Universe = vanilla
> >Output = hello.out
> >Log = hello.log
> >Requirements = (machine == "xxx1")
> >Queue 10
> >
> >hello.sh just echoes hello and sleeps for 30 seconds. If we submit this
> >job and then change to machine xxx2 and submit again, we don't get any
> >jobs run on xxx2 until all the jobs on xxx1 have completed. From what I
> >can tell, when we submit jobs this way condor stops trying to match
> >jobs in the queue after it rejects a job. So since xxx1 has 4 vm's it
> >condor will start 4 jobs on it, then see it can't run the next job, and
> >then just skip the rest of the queue instead of trying to match the jobs
> >than should be able to run on xxx2. If we take out the machine
> >requirement condor does run jobs simultaneously on xxx1 and xxx2 as
> >expected.
> >
> >Could this be a configuration error of some sort or is this a bug with
> >condor?
>
> This is an unfortunate bug that has been recently fixed for the next
> Condor release. So with v6.7.19+ you should not have to worry about it.
>
> But w/ v6.7.18, there is a bug in the code that automatically sets
> SIGNIFICANT_ATTRIBUTES.
> There are a couple ways you can work around it.
>
> v6.7.18 work around idea #1
> Use a submit file that adds one level of indirection to the
> Requirements, like so :
> executable = hello.sh
> requirements = wanted
> +wanted = (machine == "xxx1")
> queue 10
>
> v6.7.18 work around idea #2
> In our condor_config file, add
> SIGNIFICANT_ATTRIBUTES = ClusterId
> and then *restart* the schedd (condor_restart -schedd).
>
> Work around #1 will result in better negotiation, but requires
> changes to all submit files.
> Work around #2 requires no changes to submit files, but will result
> in negotiation that performs as good/bad as in Condor v6.6.x.
>
> Again, this has already been fixed in the code for v6.7.19, which
> would normally appear on the web within a week or so (but this may be
> delayed by a few days because of the Condor Week conference in
> Madison, WI next week). Note that v6.7.19 of Condor is the *last*
> developer release before the next v6.8.0 stable release.
Ok. I tested out workaround #1 and it worked fine. We only have 1
user that noticed this, so I think that won't be much of a problem to
change the submit files.
Thanks
Andy