[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_userprio -allusers and condor_userprio differ in output and unable to change -setceil



Hi,

Itâs been awfully still on this question - Iâm curious as to the lack of responses.

Iâve done some more digging yesterday, running these commands as different users, on different machines, with different options, under strace, trying to understand.

There are a few aspects I think I understand - I hope there is someone reading and willing to respond, that knows for certain.

- I can only get condor_userprio to give me information about users who have currently running jobs - options like -activefrom appear to be ignored
- I get the same information from the command if I specify -collector
- if I specify -negotiator, "Number of users: 0â
- running the same command as root instead of as myself, 

[root@stbc-i3 ~]# condor_userprio -negotiator -debug
07/18/24 08:35:29 read_password_from_filename(): read_secure_file(/etc/condor/passwords.d/POOL) failed!
07/18/24 08:35:29 SECMAN: required authentication with negotiator stbc-020.nikhef.nl failed, so aborting command GET_PRIORITY.
07/18/24 08:35:29 ERROR: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using FS|AUTHENTICATE:1004:Failed to authenticate using PASSWORD|AUTHENTICATE:1004:Failed to authenticate using MUNGE
failed to send GET_PRIORITY command to negotiator

There are a couple of configuration things that I think are contributing:

First, we have SEC_NEGOTIATOR_AUTHENTICATION_METHODS = IDTOKENS which I think means that Iâd have to have an idtoken to talk to the negotiator, which explains the Number of users: 0.  My informed guess.

Secondly, we have two negotiator machines - a main negotiator, and a negotiator for a single âexpressâ node, set up this way so that high usage (and hence low scheduling rank) on the main pool does not mean a user would have to wait ages for a slot on the express node.  However, nowhere in our configuration do we specify which machine is âTheâ negotiator, and it seems that sometimes commands are talking to the one, and sometimes to the other.  As far as I can tell, some commands are inferring the negotiator location by assuming itâs the same as CONDOR_HOST.

 For example, see the difference between a bare âcondor_userprioâ and "condor_userprio -allusersâ in the cited message below.  The key is the âtime since last usageâ â I know that neither kchemina nor mbeekvel had used the express node, but I had, this is reflected in the âtime since last usageâ.

My question to all of you is : how to configure all this correctly, so that the commands return useful information?

Thanks,

JT


On 18 Jun 2024, at 12:19, Jeff Templon <templon@xxxxxxxxx> wrote:

Hi,

Weâre still having problems here.  I welcome suggestions on how to debug this - things seem to be rather broken.  See below: condor_userprio gives vastly different answers depending on the arguments.

$ condor_userprio  
Last Priority Update:  6/18 12:17
                     Effective   Priority  Wghted Total Usage  Time Since Submitter
User Name             Priority    Factor   In Use (wghted-hrs) Last Usage  Ceiling
------------------- ------------ --------- ------ ------------ ---------- ---------
tsaracco@xxxxxxxxx       7666.88   1000.00     28       467.24      <now>    500.00
abudhraj@xxxxxxxxx      29364.14   1000.00     28      1327.53      <now>    500.00
zwolffs@xxxxxxxxx      117123.00   1000.00    320      5044.26      <now>    500.00
mbeekvel@xxxxxxxxx     196949.33   1000.00     59     25151.39      <now>    500.00
templon@xxxxxxxxx      309795.04   1000.00    478    191650.92      <now>    500.00
kchemina@xxxxxxxxx     421669.90   1000.00    500     32117.30      <now>    500.00
------------------- ------------ --------- ------ ------------ ---------- ---------
Number of users: 6                           1413    255758.62    0+23:59

$ condor_userprio -allusers | awk 'NR<=4 || /templon|mbeek|kche/â   
Last Priority Update:  6/18 12:17
                        Effective   Priority  Wghted Total Usage  Time Since Submitter
User Name                Priority    Factor   In Use (wghted-hrs) Last Usage  Ceiling
---------------------- ------------ --------- ------ ------------ ---------- ---------
kchemina@xxxxxxxxx           500.00   1000.00      0         0.00 19892+10:1     10.00
mbeekvel@xxxxxxxxx           500.00   1000.00      0         0.00 19892+10:1     10.00
templon@xxxxxxxxx            500.00   1000.00      0         0.00    9+15:30     10.00


JT

On 10 Jun 2024, at 18:44, Mary Hester <maryh@xxxxxxxxx> wrote:

Hi Christoph,

Setting a userprio to -1 doesn't seem to change the output or the behaviour on the cluster. We're running version 23.6.1---not sure if that makes a difference?

Previously we used caps for our local user submitters because there were many "oopsies" submissions which would destroy the torque headnode. If its not needed, we can always remove it but we're not using any accounting quotas at the moment and would likely only set this for two groups who have designated resources they have purchased. So we went for the userprio floor and ceiling method. Maybe we should handle this a different way...

Mary

On 10/06/2024 16:44, Beyer, Christoph wrote:
Hi Mary,

for resetting the ceiling you need to set it to '-1' e.g.

condor_userprio -setceil maryh@xxxxxxxxx -1

If you have nested accountiggroups you need to use <acctgrp>.mary@xxxxxxxxx

In general the ceiling is more meant like an absolute boundary e.g. for naughty user and not as a dynamic tool to control the usage of the pool - at least that's how I understand it.

The regular usage should be regulated by accounting quotas rather including surplus usage and priorities ...

Best
christoph

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/