Hi all, I don't know if this is a bug (I think it is), but there is a problem when you try to do a condor_off -peaceful -daemon master node from a central management machine. When the condor master gets the peaceful shutdown command, it gets it from an authorized (as ADMINISTRATOR) machine. However, when it is to propagate this command to the children daemons, it does so as the local machine, which is not in the HOSTALLOW_ADMINISTRATOR list. We can see it in the log (172.16.4.103 is our management node, and 172.16.6.2 our test node): MasterLog (trimmed, only relevant lines): 06/13/13 13:14:08 Received TCP command 60015 (DC_OFF_PEACEFUL) from unauthenticated@unmapped <172.16.4.103:46020>, access level ADMINISTRATOR Here, we see that the request comes from an authorized source. However, what the startd sees is subtly different, as the order is seen as coming from the local machine, which is not authorized: StartLog: 06/13/13 13:14:08 Calling Handler <DaemonCommandProtocol::WaitForSocketData> (2) As it later gets the sigterm: 06/13/13 13:14:14 Got SIGTERM. Performing graceful shutdown. The end result is that we get a graceful shutdown instead of the peaceful one we asked for. An obvious workaround is to change: HOSTALLOW_ADMINISTRATOR = $(CONDOR_HOST) to: HOSTALLOW_ADMINISTRATOR = $(CONDOR_HOST), $(FULL_HOSTNAME) But since it's not the default policy, nor there is a clear reason why this should be so, I think it's more of a bug. condor_master should somehow authenticate as DAEMON, or pass on the credentials to startd. When we do a condor_off -peaceful -daemon stard, however, everything works as expected since the shutdown command comes directly from the management machine. Regards, Joan -- -------------------------------------------------------------------------- Joan Josep Piles Contreras - Analista de sistemas I3A - Instituto de Investigación en Ingeniería de Aragón Tel: 876 55 51 47 (ext. 845147) http://i3a.unizar.es -- jpiles@xxxxxxxxx -------------------------------------------------------------------------- |