Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] manipulate ranking/priority of very-short-jobs-users
- Date: Fri, 18 Aug 2023 09:51:24 +0200
- From: Jeff Templon <templon@xxxxxxxxx>
- Subject: Re: [HTCondor-users] manipulate ranking/priority of very-short-jobs-users
Hi,
Fair share can indeed be used for this kind of thing, *when the system is full*. It works like this (in some cases) on our grid cluster.
Our interactive cluster is rarely completely full. So reducing start priority does not help in this case - if my start priority is rock bottom, but there is nobody else waiting â you get the picture. You need something that says âhang on, the stop/start rate for this user is absurd - throttle back new job startsâ (regardless of how full the cluster is or whether other users are waiting).
JT
> On 17 Aug 2023, at 17:29, Luehring, Frederick C <luehring@xxxxxxxxxxx> wrote:
>
> Hey Y'all,
>
> Is there a built-in method for condor to apply fair-share scheduling:
>
> https://en.wikipedia.org/wiki/Fair-share_scheduling
>
> The ATLAS Panda implementation does something along the lines of a fair-share
> algorithm using numbers of jobs submitted instead of CPU. When a user who has
> not submitted a job in over a week starts submitting new jobs, his/her jobs get
> the highest user priority of 10000. As the user submits additional jobs they are
> assigned lower and lower priority and I have seen users who submit gazillions of
> jobs get down to negative priority below -5000. Eventually Panda will move the
> user's jobs into a throttled state which is a sort of circuit breaker that
> temporarily prevents the user's new jobs from starting. The user's submission
> priority recovers because the incremental priority reduction caused by
> previously submitted jobs is removed 7 days after the job submission. This sort
> of approach seems like what is needed. The system could increase the priority of
> a limited number short jobs to allow users who are not abusing the queuing
> system to quickly run limited number of short test jobs when developing the code.
>
> Fred
>
> On 8/17/23 2:42 AM, Jeff Templon wrote:
>> Thanks! I didnât know about this stuff.
>>
>>> On 16 Aug 2023, at 17:28, Todd L Miller via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
>>>
>>>> Another issue to take into account is that a high start rate can put pressure on other systems, like shared file systems.
>>>
>>> We already have a few throttles for high overall start rates.
>>
>> Usually the problem is not so much high overall start rates, here itâs usually one user who generates 90% of the high start rate. I really donât like making everyone suffer because of one clumsy user. OTOH the other users might let this user know how clumsy he/she is - peer communication tends to be effective.
>>
>> JT
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
> --
> Frederick Luehring Indiana U luehring@xxxxxx +1 812 855 1025 IU
> http://cern.ch/Fred.Luehring Fred.Luehring@xxxxxxx +41 22 767 11 66 CERN
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/