[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Adding Windows machines to pool - IDTOKENS issue



Ah - that works perfectly - thanks so much for the clear instructions Jaime.  Iâll keep this info on file for future iterations as well.

Cheers, Craig


On 25/06/2024, at 7:59 AM, Jaime Frey <jfrey@xxxxxxxxxxx> wrote:

Try these steps to get an IDTOKEN for the windows machines:

* On the central manager, generate an IDToken like so:

condor_token_create -identity condor@$(condor_config_val CONDOR_HOST)

The output of the command will be the IDToken.

* On each windows machine, create the directory <RELEASE_DIR>\tokens.d
In that directory, create a file that contains the IDToken from the previous step. The name of the file doesnât matter.

* Restart the HTCondor daemons on the Windows machines.

You can use the same IDToken for all of the machines.

 - Jaime

On Jun 19, 2024, at 6:50âPM, Craig Parker via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:

Hi all,

Iâve been happily running Condor 8.x for some years, to run jobs on a pool of approximately 1k Windows 10 machines and itâs been great.  Many thanks to all involved in this project!

Iâve needed to replace my RHEL7 master with RHEL9 lately though, and while my test Linux execute machine works fine, Iâm having a little trouble getting Condor 23.x to work on my Windows execute machines.  

Iâve followed the install instructions for the various roles at https://htcondor.readthedocs.io/en/latest/getting-htcondor/admin-quick-start.html#admin-quick-start-guide , with the only variation being that Iâm using the Central Manager machine as the Submit machine as well, by appending 'use role:get_htcondor_submitâ to the 01-central-manager.config file.  My (RHEL9) test âExecuteâ machine works as expected, and I can see it with condor_status etc.

For the Windows execute machine(s), Iâve done a standard install of the .msi file following the instructions at https://htcondor.readthedocs.io/en/latest/getting-htcondor/install-windows-as-administrator.html#setting-up-a-whole-pool-with-windows , and the install goes fine.  I reboot the machine anyway (because Windows), and the Condor service starts and runs, but Iâm unable to send updates or info to the Central Manager, and my Master log states that I have trouble with IDTOKENS (excerpt pasted below).  This log stanza repeats every 5 minutes or so.

I really have tried to resolve this with information Iâve found on the web and via this group, but Iâve found the info fairly scattered and at times contradictory, and I have to admit this one has defeated me :(.  I feel like Iâm close, and Iâd really like to get this new 23.x pool up and running for our users, but Iâm struggling at the moment.

Can anyone help with this issue please?  If you need any more info or details, please let me know!

Many thanks, Craig


Craig Parker
Digital Solutions Client Technology Manager
Ph: +64 4 463 6052
Mob: 027 564 6052
Rankine Brown level 8
Victoria University of Wellington,
PO Box 600, Wellington 6140, New Zealand

â

06/19/24 10:49:14 Using config source: C:\condor\condor_config
06/19/24 10:49:14 Using local config sources: 
06/19/24 10:49:14    C:\condor\condor_config.local
06/19/24 10:49:14 config Macros = 50, Sorted = 50, StringBytes = 1077, TablesBytes = 1848
06/19/24 10:49:14 CLASSAD_CACHING is OFF
06/19/24 10:49:14 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS
06/19/24 10:49:14 SharedPortEndpoint: failed to open C:\condor\log/shared_port_ad: No such file or directory
06/19/24 10:49:14 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.
06/19/24 10:49:14 DaemonCore: private command socket at <10.xx.4.32:0?alias=CO-142-105-C.xxx.vuw.ac.nz&sock=master_644_e388>
06/19/24 10:49:14 Adding SHARED_PORT to DAEMON_LIST, because USE_SHARED_PORT=true (to disable this, set AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST=False)
06/19/24 10:49:14 Master restart (GRACEFUL) is watching C:\condor\bin\condor_master.exe (mtime:1715819966)
06/19/24 10:49:14 Adding/Checking Windows firewall exceptions for all daemons
06/19/24 10:49:14 Starting shared port with port: 9618
06/19/24 10:49:15 Started DaemonCore process "C:\condor\bin\condor_shared_port.exe", pid and pgroup = 2772
06/19/24 10:49:15 Waiting for C:\condor\log/shared_port_ad to appear.
06/19/24 10:49:15 Found C:\condor\log/shared_port_ad.
06/19/24 10:49:16 Started DaemonCore process "C:\condor\bin\condor_startd.exe", pid and pgroup = 3696
06/19/24 10:49:16 Daemons::StartAllDaemons all daemons were started
06/19/24 10:49:20 SECMAN: required authentication with collector vuwunicocondor4.ods.vuw.ac.nz failed, so aborting command UPDATE_MASTER_AD.
06/19/24 10:49:20 ERROR: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using IDTOKENS
06/19/24 10:49:20 Collector update failed; will try to get a token request for trust domain vuwxxxx.xxx.vuw.ac.nz, identity (default).
06/19/24 10:49:20 Failed to start non-blocking update to <10.xx.18.13:9618>.
06/19/24 10:49:20 SECMAN: required authentication with collector vuwxxxx.xxx.vuw.ac.nz failed, so aborting command DC_START_TOKEN_REQUEST.
06/19/24 10:49:20 Failed to request a new token: DAEMON:1:failed to start command for token request with remote daemon at '<10.40.18.13:9618?alias=vuwxxxx.xxx.vuw.ac.nz>'.|AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using IDTOKENS


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/