On 8/2/06, David McNabb <mcnabb@xxxxxxx> wrote:
Dear Condor users, Has anyone had any experience with mixed-version Condor pools, i.e. pools with one group of machines at some older Condor release (e.g. 6.7.18) and other machines at a newer release? In particular, should I have *any* expectation of being able to upgrade a portion of my pool (including my controller) to release 6.8, yet have those machines which are not yet upgraded continue to service jobs? Comments like "You must be crazy" are fine, but please elaborate on possible problems as much as possible. As many of you know, coordinating the timing of upgrades across multiple labs/depts/units is rarely possible. If mixing versions *is* possible, what might "mix" with a controller (and subset of nodes) at Condor v6.8? How about 6.7.18? 6.6.11? Initial testing with v6.8 execute nodes on a v6.7.18 controller fail, so I'm considering upgrading the controller. Of course there may be some other problem going on too. Thanks in advance for your insight.
I'm in the same boat except 6.6 to 6.8 :) To start with I'm going to try a seperate pool (collector/negotiatior on 6.8 plus a few execute machines on the same) then repoint my own schedd (6.6.11) to that one and see what happens - if this seems to work I'll get some of my more 'interesting' users who use vanilla checkpointing to start having a play (likely by giving them access to a machine with the 6.8 schedd on it and asking them to run a few jobs). If that works then I* will move the real pool en masse (well actually in bunches of ten but you get the idea) to 6.8 and steadily migrate users as they need/want new functionality. If not then it's big bang once my key users are happy with the 6.8 mini pool. I personally wasn't even going to bother having 6.6 submitters pointing to 6.8 collector/negotiator/startds. If any condor guys think this is a little presumptive I'd love to know. Matt * I lie here for brevity - a nice guy called Michael will do all that for me - bless him :)