[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Minor problems with upgrading 6.6.10 to 6.7.14



When we upgraded our pool from 6.6.10 to 6.7.14, our machines didn't automatically update.  The central master reported the new version, but the other machines didn't.

 

Our process:

 

Following the instructions given in the FAQ (which was a bit disconcerting since it references 6.4 and 6.6) we grabbed the newest release, untarred the release.tar file, and began swapping out the bin, include, sbin, et al directories. That went smoothly, and we sat back to watch the pool upgrade itself.  We were watching it using the following command.

 

watch -n5 'condor_status -master -format "%s\t\t" Machine -format " %s\n" CondorVersion'

 

After a few minutes all four of our machines stopped responding to condor_status.  Then, after another few minutes the Central Master machine came back up and reported the newer version.  Over the course of several new minutes the other three machines came back, but they still reported the old version.  We tried restarting them with condor_restart machineName, but that just brought them back with the old version.  Eventually we manually stopped and restarted the condor daemon and that fixed the problem.

 

Some details:

 

The four machines share an install on an NFS mounted partition.

When we updated the directories we went a little slow, not really slow, but we ended up making backups of all the dirs first, and then moved the new directories into place. It took perhaps 60 to 90 seconds for the full swap.

 

Did we do something wrong?  I wasn't expecting all the machines to go down, and I have no idea how the machines could report different versions since they share an install.

 

Any insight would be great.  Thanks a lot!

-Colin