Hi,
Itâs been awfully still on this question - Iâm curious as to the lack of responses.
Iâve done some more digging yesterday, running these commands as different users, on different machines, with different options, under strace, trying to understand.
There are a few aspects I think I understand - I hope there is someone reading and willing to respond, that knows for certain.
- I can only get condor_userprio to give me information about users who have currently running jobs - options like -activefrom appear to be ignored - I get the same information from the command if I specify -collector - if I specify -negotiator, "Number of users: 0â - running the same command as root instead of as myself,
[root@stbc-i3 ~]# condor_userprio -negotiator -debug 07/18/24 08:35:29 read_password_from_filename(): read_secure_file(/etc/condor/passwords.d/POOL) failed! 07/18/24 08:35:29 SECMAN: required authentication with negotiator stbc-020.nikhef.nl failed, so aborting command GET_PRIORITY. 07/18/24 08:35:29 ERROR: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using FS|AUTHENTICATE:1004:Failed to authenticate using PASSWORD|AUTHENTICATE:1004:Failed to authenticate using MUNGE failed to send GET_PRIORITY command to negotiator
There are a couple of configuration things that I think are contributing:
First, we have SEC_NEGOTIATOR_AUTHENTICATION_METHODS = IDTOKENS which I think means that Iâd have to have an idtoken to talk to the negotiator, which explains the Number of users: 0. My informed guess.
Secondly, we have two negotiator machines - a main negotiator, and a negotiator for a single âexpressâ node, set up this way so that high usage (and hence low scheduling rank) on the main pool does not mean a user would have to wait ages for a slot on the express node. However, nowhere in our configuration do we specify which machine is âTheâ negotiator, and it seems that sometimes commands are talking to the one, and sometimes to the other. As far as I can tell, some commands are inferring the negotiator location by assuming itâs the same as CONDOR_HOST.
For example, see the difference between a bare âcondor_userprioâ and "condor_userprio -allusersâ in the cited message below. The key is the âtime since last usageâ â I know that neither kchemina nor mbeekvel had used the express node, but I had, this is reflected in the âtime since last usageâ.
My question to all of you is : how to configure all this correctly, so that the commands return useful information?
Thanks,
JT
On 18 Jun 2024, at 12:19, Jeff Templon <templon@xxxxxxxxx> wrote:
Hi,
Weâre still having problems here. I welcome suggestions on how to debug this - things seem to be rather broken. See below: condor_userprio gives vastly different answers depending on the arguments.
$ condor_userprio Last Priority Update: 6/18 12:17 Effective Priority Wghted Total Usage Time Since Submitter User Name Priority Factor In Use (wghted-hrs) Last Usage Ceiling ------------------- ------------ --------- ------ ------------ ---------- --------- tsaracco@xxxxxxxxx 7666.88 1000.00 28 467.24 <now> 500.00 abudhraj@xxxxxxxxx 29364.14 1000.00 28 1327.53 <now> 500.00 zwolffs@xxxxxxxxx 117123.00 1000.00 320 5044.26 <now> 500.00 mbeekvel@xxxxxxxxx 196949.33 1000.00 59 25151.39 <now> 500.00 templon@xxxxxxxxx 309795.04 1000.00 478 191650.92 <now> 500.00 kchemina@xxxxxxxxx 421669.90 1000.00 500 32117.30 <now> 500.00 ------------------- ------------ --------- ------ ------------ ---------- --------- Number of users: 6 1413 255758.62 0+23:59
$ condor_userprio -allusers | awk 'NR<=4 || /templon|mbeek|kche/â Last Priority Update: 6/18 12:17 Effective Priority Wghted Total Usage Time Since Submitter User Name Priority Factor In Use (wghted-hrs) Last Usage Ceiling ---------------------- ------------ --------- ------ ------------ ---------- --------- kchemina@xxxxxxxxx 500.00 1000.00 0 0.00 19892+10:1 10.00 mbeekvel@xxxxxxxxx 500.00 1000.00 0 0.00 19892+10:1 10.00 templon@xxxxxxxxx 500.00 1000.00 0 0.00 9+15:30 10.00
JT On 10 Jun 2024, at 18:44, Mary Hester <maryh@xxxxxxxxx> wrote:
Hi Christoph, Setting a userprio to -1 doesn't seem to change the output or the behaviour on the cluster. We're running version 23.6.1---not sure if that makes a difference? Previously we used caps for our local user submitters because there were many "oopsies" submissions which would destroy the torque headnode. If its not needed, we can always remove it but we're not using any accounting quotas at the moment and would likely only set this for two groups who have designated resources they have purchased. So we went for the userprio floor and ceiling method. Maybe we should handle this a different way... Mary On 10/06/2024 16:44, Beyer, Christoph wrote: Hi Mary,
for resetting the ceiling you need to set it to '-1' e.g.
condor_userprio -setceil maryh@xxxxxxxxx -1
If you have nested accountiggroups you need to use <acctgrp>.mary@xxxxxxxxx
In general the ceiling is more meant like an absolute boundary e.g. for naughty user and not as a dynamic tool to control the usage of the pool - at least that's how I understand it.
The regular usage should be regulated by accounting quotas rather including surplus usage and priorities ...
Best christoph
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
|