[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Upgrading the cluster piecemeal instead of big bang



Hello Dennis,

Yes, you can upgrade a cluster bit by bit. In fact, there were two recent talks about doing just that.

Here is my talk at HTC 25: https://agenda.hep.wisc.edu/event/2297/contributions/34373/attachments/10386/13355/Theisen-Slides-HTC25.pdf

Greg Thain further refined my talk: https://indico.cern.ch/event/1459943/contributions/6661387/attachments/3134878/5565280/rolling.pdf

You can find these at https://htcondor.org/past_condor_weeks.html There is a recording of my talk in Day 4 of HTC 25.

The only thing missing is that we generally recommend upgrading the Central Manager first.

Be sure to read the upgrade documentation: https://htcondor.readthedocs.io/en/latest/version-history/upgrading-from-24-0-to-25-0-versions.html

Also, run the condor_upgrade_check script to see if something needs to be changed before upgrade.

Feel free to ask any questions. We are doing rolling upgrades at the CHTC all the time. And we also check interoperability between versions. So, version skew is not a problem.

...Tim

On 3/3/26 05:38, Dennis van Dok wrote:
Hi,

we are still running 24.x on our Grid cluster, planning to upgrade to 25.0.

Is it feasible to upgrade the cluster bit by bit instead of doing a Big Bang upgrade?

My idea is to upgrade the worker nodes first. If there is no change in communication protocol, these should still be able to accept work from the schedds and communicate classads with the central manager.

The four schedds could be done one by one, so we do not have to drain the entire system.

Only the central manager is a single machine, but it can go down for a few minutes without affecting running jobs, so impact should be minimal (no new jobs will be negotiated during that upgrade).

Is this a bad idea? Is there a better order for doing the upgrade? How do people generally approach these upgrades?

I did look for some guidance in the documentation but found nothing about running in a mode where versions got mixed.


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/

--
Tim Theisen (he, him, his)
Release Manager
Center for High Throughput Computing
University of Wisconsin - Madison
3695 Morgridge Hall
1205 University Ave
Madison, WI 53706