Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Condor Group Drive me Crazy.......
- Date: Mon, 14 Nov 2011 20:37:09 +0100
- From: Lukas Slebodnik <slebodnik@xxxxxxxx>
- Subject: Re: [Condor-users] Condor Group Drive me Crazy.......
On Mon, Nov 14, 2011 at 06:43:15PM +0200, Sassy Natan wrote:
> Hi Joe,
>
> Well yes this is true...
> When setting the GROUP_ACCEPT_SURPLUS_* to FALSE jobs doesn't leap outside
> the quota limit.
>
> However, Since this configuration use Sub Groups I expect to have dynamic
> allocation inside the group.
> So in my configuration:
>
> GROUP_QUOTA_group_vcs = 13
> GROUP_QUOTA_group_vcs.design_single = 4
> GROUP_QUOTA_group_vcs.design_list = 1
> GROUP_QUOTA_group_vcs.verification_single = 5
> GROUP_QUOTA_group_vcs.verification_list = 3
>
>
> The VCS group has limit of 13 slots right?
>
> So when someone from the vcs.verification_single send a Job (queue 30) -
> and the pool is clean (no jobs at the moment) the number of current running
> jobs should be 13 (17 in idle)
> This in fact what happen when I submit the job. But once a send a new job (
> queue 30) - from the verification_list I would expect that at least 3 jobs
> will run right away, causing 3 jobs from the vcs.verification_single group
> to be preempted or killed.
Just one note, you have next three lines in your local configuration file
(from previous mail)
SUSPEND = FALSE
PREEMPT = FALSE
KILL = FALSE
This is a reason, why jobs will not be preempted or killed.
Try to look look at Policy settings
http://www.cs.wisc.edu/condor/manual/v7.6/3_5Policy_Configuration.html
Lukas
> However what is happening is that the 13 jobs of the vcs.verification_single
> group are keep running and 3 something even 4 jobs being added to running
> state. Leaving me with total of 16-17 running jobs which is not good.
>
> Any Guess?
>
> I working on this all day without any luck :-(
>
> Thanks
> Sassy
>
> On Mon, Nov 14, 2011 at 5:43 PM, Joe Boyd <boyd@xxxxxxxx> wrote:
>
> > If you want those groups to be limited to only what the quota has you
> > don't want to set these to TRUE do you?
> >
> > GROUP_ACCEPT_SURPLUS_group_**vcs.verification_list = TRUE
> > GROUP_ACCEPT_SURPLUS_group_**vcs.verification_single = TRUE
> >
> > That's telling it that those groups can use any "surplus" slots in the
> > pool outside of the quota configuration if no one else is using them. If
> > you set those to FALSE doesn't it do what you want?
> >
> > joe
> >
> >
> > Sassy Natan wrote:
> >
> >> Hi Again....
> >>
> >> I'm kind of lost here.
> >> Enable debug mode and check the logs and still no good.
> >>
> >>
> >> I attach the condor.local.conf file ....
> >>
> >>
> >> Thanks for the help....
> >>
> >>
> >> On Sun, Nov 13, 2011 at 6:00 PM, Sassy Natan <sassyn@xxxxxxxxx <mailto:
> >> sassyn@xxxxxxxxx>> wrote:
> >>
> >> Hi All
> >> Here is cut and paste from my condor configuration file:
> >>
> >> GROUP_NAMES = GROUP_VCS, GROUP_VCS.DESIGN_SINGLE,
> >> GROUP_VCS.DESIGN_LIST, GROUP_VCS.VERIFICATION_SINGLE,
> >> GROUP_VCS.VERIFICATION_LIST
> >>
> >> GROUP_QUOTA_group_vcs = 13
> >> GROUP_QUOTA_group_vcs.design_**single = 4
> >> GROUP_QUOTA_group_vcs.design_**list = 1
> >> GROUP_QUOTA_group_vcs.**verification_single = 5
> >> GROUP_QUOTA_group_vcs.**verification_list = 3
> >>
> >>
> >> GROUP_AUTOREGROUP = FALSE
> >> GROUP_ACCEPT_SURPLUS = FALSE
> >>
> >> GROUP_AUTOREGROUP_group_vcs = FALSE
> >> GROUP_ACCEPT_SURPLUS_group_vcs = FALSE
> >>
> >> GROUP_AUTOREGROUP_group_vcs.**design_single = FALSE
> >> GROUP_ACCEPT_SURPLUS_group_**vcs.design_single = TRUE
> >>
> >> GROUP_AUTOREGROUP_group_vcs.**design_list = FALSE
> >> GROUP_ACCEPT_SURPLUS_group_**vcs.design_list = TRUE
> >>
> >> GROUP_AUTOREGROUP_group_vcs.**verification_single = FALSE
> >> GROUP_ACCEPT_SURPLUS_group_**vcs.verification_single = TRUE
> >>
> >> GROUP_AUTOREGROUP_group_vcs.**verification_list = FALSE
> >> GROUP_ACCEPT_SURPLUS_group_**vcs.verification_list = TRUE
> >>
> >>
> >> I have now 2 submission files, each with 100 Jobs....
> >> submit the first file name: verification_single.sub start processing
> >> 13 jobs as expected (with the
> >> group group_vcs.verification_single specified in the submit file)
> >>
> >> so far everything is good...
> >> after 5 min I now submitting the next file
> >> name verification_list.sub (with the
> >> group group_vcs.verification_list specified in the submit file)
> >>
> >> Expected results are that at least 4 jobs from verification_list.sub
> >> will start run and total of 13 fobs will run in the cluster. All
> >> other 187 jobs should be idle consider none of them as finished
> >> (Each submission include 100 jobs).
> >>
> >> However the real results is that I get 18 jobs running which is not
> >> good! Why? Why? Why? Why?
> >> I just don't understand it.
> >>
> >> I also enable NEGOTIATOR_CONSIDER_PREEMPTION since I would like to
> >> use PREEMPTION.
> >> I would expect that from the 13 running process from
> >> the verification_single.sub submission, once I submit
> >> the verification_list.sub, 4 jobs will be PREEMPT...
> >>
> >> Takes for any help....
> >> Sassy
> >>
> >>
> >>
> >>
> >> ------------------------------**------------------------------**
> >> ------------
> >>
> >> ______________________________**_________________
> >> Condor-users mailing list
> >> To unsubscribe, send a message to condor-users-request@xxxxxxxx**edu<condor-users-request@xxxxxxxxxxx>with a
> >> subject: Unsubscribe
> >> You can also unsubscribe by visiting
> >> https://lists.cs.wisc.edu/**mailman/listinfo/condor-users<https://lists.cs.wisc.edu/mailman/listinfo/condor-users>
> >>
> >> The archives can be found at:
> >> https://lists.cs.wisc.edu/**archive/condor-users/<https://lists.cs.wisc.edu/archive/condor-users/>
> >>
> > ______________________________**_________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxx**edu<condor-users-request@xxxxxxxxxxx>with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/**mailman/listinfo/condor-users<https://lists.cs.wisc.edu/mailman/listinfo/condor-users>
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/**archive/condor-users/<https://lists.cs.wisc.edu/archive/condor-users/>
> >
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/