[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] Quick question about rebooting central manager



> I've a setup of Condor on Win(2000|XP) here. Some 100 machines with a
> single
> central manager.
> >From the documentation, I understood that rebooting the central
manager
> machine is not going to have any impact on jobs that are running, only
> that
> I won't be able to submit new jobs.
> Currently I need to reboot the machine for technical reasons, but
before
> doing that I would like to ask if there's going to be ANY impact on
jobs
> that are on hold.
> To detail the scenario, I submitted 500 jobs in hold status, and
released
> only 50, so 450 are still on hold. Are those going to survive the
> reboot???

When you say "Central Manager" I'm assuming that mean a single machine
running both your condor_negotiatior and condor_collector daemons.
Correct? And that the machine you've submitted these held and running
jobs from, the one running the condor_schedd daemon, is not the same
physical machine. If these assumptions are correct you can safely reboot
the "Central Manager" machine without any adverse impact to running or
queued jobs.

> Alternatively, but not the preferred option, I can setup a 2nd central
> manager. I rather prefer to avoid that.

You may wish to consider taking advantage of Condor's high availability
features in the 6.7.x development branch to run a mirror central
manager. This would give you complete redundancy and the ability to
reboot a central manager without halting the assignment of new jobs to
remote execution nodes.

- Ian