Iâve done some more digging yesterday, running these commands
as different users, on different machines, with different
options, under strace, trying to understand.
There are a few aspects I think I understand - I hope there
is someone reading and willing to respond, that knows for
certain.
- I can only get condor_userprio to give me information about
users who have currently running jobs - options like -activefrom
appear to be ignored
- I get the same information from the command if I specify
-collector
- if I specify -negotiator, "Number of
users: 0â
- running the same command as root instead of as myself,
[root@stbc-i3 ~]# condor_userprio
-negotiator -debug
07/18/24 08:35:29
read_password_from_filename():
read_secure_file(/etc/condor/passwords.d/POOL) failed!
07/18/24 08:35:29 SECMAN: required
authentication with negotiator stbc-020.nikhef.nl failed, so
aborting command GET_PRIORITY.
07/18/24 08:35:29 ERROR:
AUTHENTICATE:1003:Failed to authenticate with any
method|AUTHENTICATE:1004:Failed to authenticate using
FS|AUTHENTICATE:1004:Failed to authenticate using
PASSWORD|AUTHENTICATE:1004:Failed to authenticate using
MUNGE
failed to send GET_PRIORITY command to
negotiator
There are a couple of configuration things that I think are
contributing:
First, we have SEC_NEGOTIATOR_AUTHENTICATION_METHODS
= IDTOKENS which I think means that Iâd have to have
an idtoken to talk to the negotiator, which explains the Number
of users: 0. My informed guess.
Secondly, we have two negotiator machines - a main
negotiator, and a negotiator for a single âexpressâ node, set up
this way so that high usage (and hence low scheduling rank) on
the main pool does not mean a user would have to wait ages for a
slot on the express node. However, nowhere in our configuration
do we specify which machine is âTheâ negotiator, and it seems
that sometimes commands are talking to the one, and sometimes to
the other. As far as I can tell, some commands are inferring
the negotiator location by assuming itâs the same as CONDOR_HOST.
For example, see the difference between a bare
âcondor_userprioâ and "condor_userprio -allusersâ in the cited
message below. The key is the âtime since last usageâ â I know
that neither kchemina nor mbeekvel had used the express node,
but I had, this is reflected in the âtime since last usageâ.
My question to all of you is : how to configure all this
correctly, so that the commands return useful information?
Thanks,
JT
Hi,
Weâre still having problems here. I welcome
suggestions on how to debug this - things seem to be
rather broken. See below: condor_userprio gives
vastly different answers depending on the arguments.
$ condor_userprio
Last Priority Update: 6/18
12:17
Effective
Priority Wghted Total Usage Time Since
Submitter
User Name Priority
Factor In Use (wghted-hrs) Last Usage
Ceiling
-------------------
------------ --------- ------ ------------
---------- ---------
-------------------
------------ --------- ------ ------------
---------- ---------
Number of users: 6
1413 255758.62 0+23:59
$ condor_userprio -allusers |
awk 'NR<=4 || /templon|mbeek|kche/â
Last Priority
Update: 6/18 12:17
Effective Priority Wghted Total Usage Time
Since Submitter
User Name
Priority Factor In Use (wghted-hrs) Last
Usage Ceiling
----------------------
------------ --------- ------ ------------
---------- ---------
JT
Hi Christoph,
Setting a userprio to -1 doesn't seem to
change the output or the behaviour on the
cluster. We're running version 23.6.1---not
sure if that makes a difference?
Previously we used caps for our local user
submitters because there were many "oopsies"
submissions which would destroy the torque
headnode. If its not needed, we can always
remove it but we're not using any accounting
quotas at the moment and would likely only set
this for two groups who have designated
resources they have purchased. So we went for
the userprio floor and ceiling method. Maybe
we should handle this a different way...
Mary
On 10/06/2024 16:44, Beyer, Christoph wrote:
Hi Mary,
for resetting the ceiling you need to set it
to '-1' e.g.
condor_userprio -setceil maryh@xxxxxxxxx -1
If you have nested accountiggroups you need
to use <acctgrp>.mary@xxxxxxxxx
In general the ceiling is more meant like an
absolute boundary e.g. for naughty user and
not as a dynamic tool to control the usage
of the pool - at least that's how I
understand it.
The regular usage should be regulated by
accounting quotas rather including surplus
usage and priorities ...
Best
christoph
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/