Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] manipulate ranking/priority of very-short-jobs-users
- Date: Thu, 17 Aug 2023 19:28:56 +0200 (CEST)
- From: "Beyer, Christoph" <christoph.beyer@xxxxxxx>
- Subject: Re: [HTCondor-users] manipulate ranking/priority of very-short-jobs-users
Hi,
fair share is built in in condor you can tweak to a certain amount using the slotweight etc. I kjust miss the explicit possibility to punish very short job runtimes by making them more costly for the user ...
Best
christoph
--
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
----- UrsprÃngliche Mail -----
Von: "Luehring, Frederick C" <luehring@xxxxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Donnerstag, 17. August 2023 17:29:49
Betreff: Re: [HTCondor-users] manipulate ranking/priority of very-short-jobs-users
Hey Y'all,
Is there a built-in method for condor to apply fair-share scheduling:
https://en.wikipedia.org/wiki/Fair-share_scheduling
The ATLAS Panda implementation does something along the lines of a fair-share
algorithm using numbers of jobs submitted instead of CPU. When a user who has
not submitted a job in over a week starts submitting new jobs, his/her jobs get
the highest user priority of 10000. As the user submits additional jobs they are
assigned lower and lower priority and I have seen users who submit gazillions of
jobs get down to negative priority below -5000. Eventually Panda will move the
user's jobs into a throttled state which is a sort of circuit breaker that
temporarily prevents the user's new jobs from starting. The user's submission
priority recovers because the incremental priority reduction caused by
previously submitted jobs is removed 7 days after the job submission. This sort
of approach seems like what is needed. The system could increase the priority of
a limited number short jobs to allow users who are not abusing the queuing
system to quickly run limited number of short test jobs when developing the code.
Fred
On 8/17/23 2:42 AM, Jeff Templon wrote:
> Thanks! I didnât know about this stuff.
>
>> On 16 Aug 2023, at 17:28, Todd L Miller via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
>>
>>> Another issue to take into account is that a high start rate can put pressure on other systems, like shared file systems.
>>
>> We already have a few throttles for high overall start rates.
>
> Usually the problem is not so much high overall start rates, here itâs usually one user who generates 90% of the high start rate. I really donât like making everyone suffer because of one clumsy user. OTOH the other users might let this user know how clumsy he/she is - peer communication tends to be effective.
>
> JT
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
--
Frederick Luehring Indiana U luehring@xxxxxx +1 812 855 1025 IU
http://cern.ch/Fred.Luehring Fred.Luehring@xxxxxxx +41 22 767 11 66 CERN
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/