Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] [Issue][v8.6.11] - Setting the NiceUser parameter to "TRUE" breaks group quotas.
- Date: Fri, 05 Feb 2021 13:25:54 +0100
- From: MaÃl Lefeuvre <mael.lefeuvre@xxxxxxxxxxxxxx>
- Subject: [HTCondor-users] [Issue][v8.6.11] - Setting the NiceUser parameter to "TRUE" breaks group quotas.
Greetings,
We're currently struggling to make Group-Quotas and the "nice-user"
feature of HTCondor coexist within our pool (v8.6.11). I've heard this
is a bug that has just recently been fixed in v8.9.9 & v8.9.10, but I'm
nonetheless posting the issue to see if anyone can help us in designing
a workaround.
CONTEXT:
We're trying to create a "dynamic" queuing system on our computer
cluster were our entire pool is designed to limit the runtime of
submitted jobs by default, while still allowing for a limited number of
"unrestricted jobs" to exist (unlimited lifetime).
The solution we designed makes use of accounting groups and group quotas
:
- The accounting group "LongJobs" is accessible to all users.
- A dynamic group quota then sets a hard limit to these Longjobs to ~75%
of our pool(no surplus allowed)
- Jobs will get held if they exceed a runtime of 1hour UNLESS the user
is a member of the "LongJob" group.
-----------------------
GROUP_NAMES = LongJobs
GROUP_QUOTA_DYNAMIC_LongJobs = 0.75
GROUP_ACCEPT_SURPLUS_LongJobs = false
RUNTIME_EXCEEDED = (TARGET.AcctGroup=!="LongJobs" && (JobStatus==2) &&
(time() - EnteredCurrentStatus) >(1*3600))
PREMPT = [...]
WANT_SUSPEND = [...]
WANT_HOLD = [...]
-----------------------
This ensures at least 25% of our pool stays available to run short jobs,
while still giving users the ability to submit (very) long jobs.
PROBLEM:
Setting...
accounting_group = "LongJobs"
nice_user = True
...within a submit description file will overwrite the group quota : the
user becomes "nice-user.LongJobs.<user>@<domain>" and is not recognized
as a valid accounting group when looking at "condor_userprio".
Thus, "nice-user.LongJobs.<user>" jobs are able to completely fill our
pool, while retaining the privileged policies that are attached to the
"LongJobs" group...
An obvious and straighforward solution could be to disable the nice-user
setting entirely but this feature is in our case very popular with our
users, so keeping it intact remains a priority.
My question is therefore :
[1] - Does anyone know of a way to make GroupQuotas and nice-user
policies coexist within a v8.6.11 HTCondor Pool ?
[2] - If the answer to the first question is "No", are there viable
alternatives to implement our desired policy while keeping the nice-user
parameter intact ?
Any help or suggestions would be greatly appreciated and thanks in
advance to anyone willing to take a closer look at this issue (or has
kept reading 'till this point).
Cheers,
MaÃl