[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor Group Drive me Crazy.......





On Mon, Nov 14, 2011 at 9:14 PM, Joe Boyd <boyd@xxxxxxxx> wrote:
Why can't you just set all the GROUP_ACCEPT_SURPLUS parameters to FALSE?  What doesn't work the way you want then?

Well if I set it to false (based on the configuration) the GROUP_QUOTA_group_vcs.__verification_single will only have up to 5 jobs....
If the GROUP_ACCEPT_SURPLUS  is set to true then if no one running a job from the other groups then it will have 13.

The all idea is to have some control on jobs that require FlexLM lic.
I know about Concurrency Limits but I found it not to be a good option in my case.

My users want to make sure that If  the provide one of the VCS group it will start run right away. (this under the considerations that no other jobs from the same group is in the pool. So if 10 jobs already exists in the pool from the group  design___single, and 10 more jobs are being submitted with the design___single group definition, then it is understood that the pool will be process in a FIFO way)


Here is the conf again: 

       GROUP_QUOTA_group_vcs = 13
       GROUP_QUOTA_group_vcs.design___single = 4
       GROUP_QUOTA_group_vcs.design___list = 1
       GROUP_QUOTA_group_vcs.__verification_single = 5
       GROUP_QUOTA_group_vcs.__verification_list  = 3

joe

Sassy Natan wrote:
Thanks For the Help Man!

On Mon, Nov 14, 2011 at 8:01 PM, Joe Boyd <boyd@xxxxxxxx <mailto:boyd@xxxxxxxx>> wrote:

   I may have missed a part of the thread or something as I'm not sure
   what you're trying to have it do in the end.

   If you have any of the GROUP_ACCEPT_SURPLUS_* parameters set to TRUE
   that group will end up running more than the quota set if you have
   free slots.  You said it only gets 13 jobs running when you submit
   that first job.  Is that true even after several negotiation
   cycles???  I'm surprised it wouldn't run more.
yes, even after several negotiation cycles running job not going up more then 13.

If I send the two submission file the same time (60 job totals of two group)  I also don't get more then 13 jobs...
 
    I see that you have

   GROUP_ACCEPT_SURPLUS_group_vcs = FALSE

   but I don't think that's going to make the top level group_vcs not
   go above it's 13 if the subgroups have it TRUE.  I'm not sure that
   parameter is really enforced down the hierarchy (sounds like it's
   not from your experience).  Is that why you're saying it shouldn't
   run more than 13?  Because of the setting I quote above?

Yes this is what I'm saying.....

In that case I don't understand the preemption method.
what do u suggester?

 
   joe


   Sassy Natan wrote:

       Hi Joe,

       Well yes this is true... When setting the GROUP_ACCEPT_SURPLUS_*
       to FALSE jobs doesn't leap outside the quota limit.

       However, Since this configuration use Sub Groups I expect to
       have dynamic allocation inside the group.
       So in my configuration:

       GROUP_QUOTA_group_vcs = 13
       GROUP_QUOTA_group_vcs.design___single = 4
       GROUP_QUOTA_group_vcs.design___list = 1
       GROUP_QUOTA_group_vcs.__verification_single = 5
       GROUP_QUOTA_group_vcs.__verification_list  = 3
                The VCS group has limit of 13 slots right?

       So when someone from the vcs.verification_single send a Job
       (queue 30) - and the pool is clean (no jobs at the moment) the
       number of current running jobs should be 13 (17 in idle)
       This in fact what happen when I submit the job. But once a send
       a new job (queue 30) - from the verification_list I would expect
       that at least 3 jobs will run right away, causing 3 jobs from
       the  vcs.verification_single group to be preempted or killed.
       However what is happening is that the 13 jobs of the
        vcs.verification_single group are keep running and 3 something
       even 4 jobs being added to running state. Leaving me with total
       of 16-17 running jobs which is not good.

       Any Guess?

       I working on this all day without any luck :-(

       Thanks
       Sassy          On Mon, Nov 14, 2011 at 5:43 PM, Joe Boyd <boyd@xxxxxxxx
       <mailto:boyd@xxxxxxxx> <mailto:boyd@xxxxxxxx

       <mailto:boyd@xxxxxxxx>>> wrote:

          If you want those groups to be limited to only what the quota has
          you don't want to set these to TRUE do you?

          GROUP_ACCEPT_SURPLUS_group_____vcs.verification_list  = TRUE
          GROUP_ACCEPT_SURPLUS_group_____vcs.verification_single = TRUE



          That's telling it that those groups can use any "surplus"
       slots in
          the pool outside of the quota configuration if no one else is
       using
          them. If you set those to FALSE doesn't it do what you want?

          joe


          Sassy Natan wrote:

              Hi Again....

              I'm kind of lost here.
              Enable debug mode and check the logs and still no good.


              I attach the condor.local.conf file ....


              Thanks for the help....


              On Sun, Nov 13, 2011 at 6:00 PM, Sassy Natan
       <sassyn@xxxxxxxxx <mailto:sassyn@xxxxxxxxx>
              <mailto:sassyn@xxxxxxxxx <mailto:sassyn@xxxxxxxxx>>
       <mailto:sassyn@xxxxxxxxx <mailto:sassyn@xxxxxxxxx>

              <mailto:sassyn@xxxxxxxxx <mailto:sassyn@xxxxxxxxx>>>> wrote:

                 Hi All
                 Here is cut and paste from my condor configuration file:

                 GROUP_NAMES = GROUP_VCS, GROUP_VCS.DESIGN_SINGLE,
                 GROUP_VCS.DESIGN_LIST, GROUP_VCS.VERIFICATION_SINGLE,
                 GROUP_VCS.VERIFICATION_LIST

                 GROUP_QUOTA_group_vcs = 13
                 GROUP_QUOTA_group_vcs.design_____single = 4
                 GROUP_QUOTA_group_vcs.design_____list = 1
                 GROUP_QUOTA_group_vcs.____verification_single = 5
                 GROUP_QUOTA_group_vcs.____verification_list  = 3




                 GROUP_AUTOREGROUP = FALSE
                 GROUP_ACCEPT_SURPLUS = FALSE

                 GROUP_AUTOREGROUP_group_vcs = FALSE
                 GROUP_ACCEPT_SURPLUS_group_vcs = FALSE

                 GROUP_AUTOREGROUP_group_vcs.____design_single = FALSE
                 GROUP_ACCEPT_SURPLUS_group_____vcs.design_single = TRUE

                 GROUP_AUTOREGROUP_group_vcs.____design_list = FALSE
                 GROUP_ACCEPT_SURPLUS_group_____vcs.design_list = TRUE

                 GROUP_AUTOREGROUP_group_vcs.____verification_single =
       FALSE
                 GROUP_ACCEPT_SURPLUS_group_____vcs.verification_single
       = TRUE

                 GROUP_AUTOREGROUP_group_vcs.____verification_list  = FALSE
                 GROUP_ACCEPT_SURPLUS_group_____vcs.verification_list

        = TRUE



                 I have now 2 submission files, each with 100 Jobs....
                 submit the first file name: verification_single.sub start
              processing
                 13 jobs as expected (with the
                 group group_vcs.verification_single specified in the
       submit file)

                 so far everything is good...
                 after 5 min I now submitting the next file
                 name verification_list.sub (with the
                 group group_vcs.verification_list specified in the
       submit file)

                 Expected results are that at least 4 jobs from
              verification_list.sub
                 will start run and total of 13 fobs will run in the
       cluster.
                  All other 187 jobs should be idle consider none of
       them as
              finished
                 (Each submission include 100 jobs).

                 However the real results is that I get 18 jobs running
       which
              is not
                 good! Why? Why? Why? Why?
                 I just don't understand it.

                 I also enable NEGOTIATOR_CONSIDER_PREEMPTION since I would
              like to
                 use PREEMPTION.
                 I would expect that from the 13 running process from
                 the verification_single.sub submission, once I submit
                 the  verification_list.sub, 4 jobs will be PREEMPT...

                 Takes for any help....
                 Sassy




                     ------------------------------____----------------------------__--__------------

              ___________________________________________________


              Condor-users mailing list
              To unsubscribe, send a message to
              condor-users-request@xxxxxxxxx___edu
              <mailto:condor-users-request@__cs.wisc.edu

       <mailto:condor-users-request@cs.wisc.edu>> with a

              subject: Unsubscribe
              You can also unsubscribe by visiting
                     https://lists.cs.wisc.edu/____mailman/listinfo/condor-users
       <https://lists.cs.wisc.edu/__mailman/listinfo/condor-users>


                     <https://lists.cs.wisc.edu/__mailman/listinfo/condor-users
       <https://lists.cs.wisc.edu/mailman/listinfo/condor-users>>

              The archives can be found at:
              https://lists.cs.wisc.edu/____archive/condor-users/
       <https://lists.cs.wisc.edu/__archive/condor-users/>
              <https://lists.cs.wisc.edu/__archive/condor-users/
       <https://lists.cs.wisc.edu/archive/condor-users/>>

          ___________________________________________________

          Condor-users mailing list
          To unsubscribe, send a message to
       condor-users-request@xxxxxxxxx___edu
          <mailto:condor-users-request@__cs.wisc.edu

       <mailto:condor-users-request@cs.wisc.edu>> with a

          subject: Unsubscribe
          You can also unsubscribe by visiting
          https://lists.cs.wisc.edu/____mailman/listinfo/condor-users
       <https://lists.cs.wisc.edu/__mailman/listinfo/condor-users>


          <https://lists.cs.wisc.edu/__mailman/listinfo/condor-users
       <https://lists.cs.wisc.edu/mailman/listinfo/condor-users>>

          The archives can be found at:
          https://lists.cs.wisc.edu/____archive/condor-users/
       <https://lists.cs.wisc.edu/__archive/condor-users/>
          <https://lists.cs.wisc.edu/__archive/condor-users/
       <https://lists.cs.wisc.edu/archive/condor-users/>>





       ------------------------------__------------------------------__------------

       _________________________________________________
       Condor-users mailing list
       To unsubscribe, send a message to
       condor-users-request@xxxxxxxxx_edu
       <mailto:condor-users-request@cs.wisc.edu> with a
       subject: Unsubscribe
       You can also unsubscribe by visiting
       https://lists.cs.wisc.edu/__mailman/listinfo/condor-users
       <https://lists.cs.wisc.edu/mailman/listinfo/condor-users>

       The archives can be found at:
       https://lists.cs.wisc.edu/__archive/condor-users/
       <https://lists.cs.wisc.edu/archive/condor-users/>

   _________________________________________________
   Condor-users mailing list
   To unsubscribe, send a message to condor-users-request@xxxxxxxxx_edu
   <mailto:condor-users-request@cs.wisc.edu> with a
   subject: Unsubscribe
   You can also unsubscribe by visiting
   https://lists.cs.wisc.edu/__mailman/listinfo/condor-users
   <https://lists.cs.wisc.edu/mailman/listinfo/condor-users>

   The archives can be found at:
   https://lists.cs.wisc.edu/__archive/condor-users/
   <https://lists.cs.wisc.edu/archive/condor-users/>



------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxedu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxedu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/