[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor upgrade from 9.0x to 10.0x



Alternately, you can set TRUST_DOMAIN=$(COLLECTOR_HOST) in the config file(s) on all machines that have IDToken signing keys. Only do this if COLLECTOR_HOST is a simple hostname or IP address. This controls the issuer of the tokens. If your COLLECTOR_HOST is a list of hostnames (i.e. youâre using HAD), then you should keep the new default setting for TRUST_DOMAIN (or pick a new hostname-like value) and reissue tokens.

 - Jaime

On Oct 12, 2023, at 6:23 AM, Andreas Haupt <andreas.haupt@xxxxxxx> wrote:

Hi Thomas,

we did this update recently. One thing to note from our side: all idtokens
issued with HTC-9.0 were no longer valid with HTC-10.0. We had to recreate
them after the update.

Just in case you use idtokens for any authentication, you'll be warned ;-)

Cheers,
Andreas

On Wed, 2023-10-04 at 08:16 +0000, Thomas Birkett - STFC UKRI wrote:
Hi all,
 
Weâre currently looking at upgrading our Condor pool to Condor 10.0.9
from Condor 9.0.15. I plan on upgrading the Scheddâs first, in testing,
this works as expected where the daemons get restarted, jobs in the
queue are picked up again and the Schedd carries on where it left off. I
then plan on upgrading the Startdâs next, this again, goes smoothly. We
have the config setup so a graceful restart of the daemons happens, the
jobs drain out, condor is restarted and jobs start to run on the startd
once again. However, when we upgrade the Central Managers, Startds loose
communication to the Central Managers and are only re-established after
a restart of the Condor daemons on the startd host, this would kill any
running jobs on the node.
 
Looking at the changes between the two versions, I believe this may be
to do with the following:
 
- https://opensciencegrid.atlassian.net/browse/HTCONDOR-283
- https://opensciencegrid.atlassian.net/browse/HTCONDOR-287
- https://opensciencegrid.atlassian.net/browse/HTCONDOR-1057
 
Iâm guessing as there is this change in Condor 10, the startâs need to
re-negotiate the security between the daemons, requiring this restart.
My question to the community is if there is a way to upgrade the Condor
pool without requiring the startd restart once the Central Managers are
upgraded. Interestingly this does not affect the scheddâs which continue
to communicate with the Central Managers.
 
Many thanks,
 
Thomas Birkett
Senior Systems Administrator
Scientific Computing Department  
Science and Technology Facilities Council (STFC)
Rutherford Appleton Laboratory, Chilton, Didcot 
OX11 0QX
 
signature_609518872
 
 
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

-- 
| Andreas Haupt            | E-Mail: andreas.haupt@xxxxxxx
|  DESY Zeuthen            | WWW:    http://www.zeuthen.desy.de/~ahaupt
|  Platanenallee 6         | Phone:  +49/33762/7-7359
|  D-15738 Zeuthen         | Fax:    +49/33762/7-7216


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/