Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] A heads-up about HTCondor on Windows and daylight savings time.
- Date: Mon, 27 Oct 2014 09:53:32 -0500
- From: johnkn <johnkn@xxxxxxxxxxx>
- Subject: [HTCondor-users] A heads-up about HTCondor on Windows and daylight savings time.
At 2am On November 2 here in the USA, daylight savings time will end.
This will trigger a bug in the c-runtime on Windows that will cause
HTCondor to think that the timestamp on the condor_master.exe has
changed. We thought that we had code in HTCondor to work around this
bug, but it turns out it is only a partial fix, and all versions of
HTCondor will react to the time change this coming weekend.
At this point what happens depends on what your
MASTER_NEW_BINARY_RESTART configuration variable is set to. The choices
are.
MASTER_NEW_BINARY_RESTART = GRACEFUL
The condor_master restarts all of the child daemons and itself
gracefully. This means that jobs get a signal to checkpoint and then are
killed 2 minutes later, put back in the queue and restarted on some
other node. SCHEDD's will shutdown and then restart and try and
reconnect to running jobs - This is the default behavior, and a
reasonable choice for SCHEDDs and your central manager.
MASTER_NEW_BINARY_RESTART = PEACEFUL
The condor_master tell's child daemons to restart when they are done
with current work. Thsi means that STARTDs will finish current jobs but
not accept any new ones until they get a chance to restart. SCHEDDs
will finish current jobs but will not start any new ones. This is a
reasonable configuration for STARTDs, but not the best choice for your
SCHEDDs or central manager.
MASTER_NEW_BINARY_RESTART = FAST
The condor_master kills all child daemons then restarts them. This
is a reasonable configuration for your central manager.
MASTER_NEW_BINARY_RESTART = NO
MASTER_NEW_BINARY_RESTART = NEVER
MASTER_NEW_BINARY_RESTART = NONE
The condor_master notices the change but does nothing. This is a
reasonable choice for all daemons, but it disables the ability to
upgrade the HTCondor binaries without explicitly restarting them.
I recommend that you set MASTER_NEW_BINARY_RESTART to either PEACEFUL or
NEVER on your execute nodes before next weekend if you expect to have
jobs running over the weekend.
-tj