[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Adding Windows machines to pool - IDTOKENS issue



Hi all,

Iâve been happily running Condor 8.x for some years, to run jobs on a pool of approximately 1k Windows 10 machines and itâs been great.  Many thanks to all involved in this project!

Iâve needed to replace my RHEL7 master with RHEL9 lately though, and while my test Linux execute machine works fine, Iâm having a little trouble getting Condor 23.x to work on my Windows execute machines.  

Iâve followed the install instructions for the various roles at https://htcondor.readthedocs.io/en/latest/getting-htcondor/admin-quick-start.html#admin-quick-start-guide , with the only variation being that Iâm using the Central Manager machine as the Submit machine as well, by appending 'use role:get_htcondor_submitâ to the 01-central-manager.config file.  My (RHEL9) test âExecuteâ machine works as expected, and I can see it with condor_status etc.

For the Windows execute machine(s), Iâve done a standard install of the .msi file following the instructions at https://htcondor.readthedocs.io/en/latest/getting-htcondor/install-windows-as-administrator.html#setting-up-a-whole-pool-with-windows , and the install goes fine.  I reboot the machine anyway (because Windows), and the Condor service starts and runs, but Iâm unable to send updates or info to the Central Manager, and my Master log states that I have trouble with IDTOKENS (excerpt pasted below).  This log stanza repeats every 5 minutes or so.

I really have tried to resolve this with information Iâve found on the web and via this group, but Iâve found the info fairly scattered and at times contradictory, and I have to admit this one has defeated me :(.  I feel like Iâm close, and Iâd really like to get this new 23.x pool up and running for our users, but Iâm struggling at the moment.

Can anyone help with this issue please?  If you need any more info or details, please let me know!

Many thanks, Craig


Craig Parker
Digital Solutions Client Technology Manager
Ph: +64 4 463 6052
Mob: 027 564 6052
Rankine Brown level 8
Victoria University of Wellington,
PO Box 600, Wellington 6140, New Zealand

â

06/19/24 10:49:14 Using config source: C:\condor\condor_config
06/19/24 10:49:14 Using local config sources: 
06/19/24 10:49:14    C:\condor\condor_config.local
06/19/24 10:49:14 config Macros = 50, Sorted = 50, StringBytes = 1077, TablesBytes = 1848
06/19/24 10:49:14 CLASSAD_CACHING is OFF
06/19/24 10:49:14 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS
06/19/24 10:49:14 SharedPortEndpoint: failed to open C:\condor\log/shared_port_ad: No such file or directory
06/19/24 10:49:14 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.
06/19/24 10:49:14 DaemonCore: private command socket at <10.xx.4.32:0?alias=CO-142-105-C.xxx.vuw.ac.nz&sock=master_644_e388>
06/19/24 10:49:14 Adding SHARED_PORT to DAEMON_LIST, because USE_SHARED_PORT=true (to disable this, set AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST=False)
06/19/24 10:49:14 Master restart (GRACEFUL) is watching C:\condor\bin\condor_master.exe (mtime:1715819966)
06/19/24 10:49:14 Adding/Checking Windows firewall exceptions for all daemons
06/19/24 10:49:14 Starting shared port with port: 9618
06/19/24 10:49:15 Started DaemonCore process "C:\condor\bin\condor_shared_port.exe", pid and pgroup = 2772
06/19/24 10:49:15 Waiting for C:\condor\log/shared_port_ad to appear.
06/19/24 10:49:15 Found C:\condor\log/shared_port_ad.
06/19/24 10:49:16 Started DaemonCore process "C:\condor\bin\condor_startd.exe", pid and pgroup = 3696
06/19/24 10:49:16 Daemons::StartAllDaemons all daemons were started
06/19/24 10:49:20 SECMAN: required authentication with collector vuwunicocondor4.ods.vuw.ac.nz failed, so aborting command UPDATE_MASTER_AD.
06/19/24 10:49:20 ERROR: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using IDTOKENS
06/19/24 10:49:20 Collector update failed; will try to get a token request for trust domain vuwxxxx.xxx.vuw.ac.nz, identity (default).
06/19/24 10:49:20 Failed to start non-blocking update to <10.xx.18.13:9618>.
06/19/24 10:49:20 SECMAN: required authentication with collector vuwxxxx.xxx.vuw.ac.nz failed, so aborting command DC_START_TOKEN_REQUEST.
06/19/24 10:49:20 Failed to request a new token: DAEMON:1:failed to start command for token request with remote daemon at '<10.40.18.13:9618?alias=vuwxxxx.xxx.vuw.ac.nz>'.|AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using IDTOKENS