Hi all,
I don't know if this is a bug (I think it is), but there
is a problem when you try to do a condor_off -peaceful
-daemon master node from a central management machine.
When the condor master gets the peaceful shutdown command,
it gets it from an authorized (as ADMINISTRATOR) machine.
However, when it is to propagate this command to the
children daemons, it does so as the local machine, which
is not in the HOSTALLOW_ADMINISTRATOR list. We can see it
in the log (172.16.4.103 is our management node, and
172.16.6.2 our test node):
MasterLog (trimmed, only relevant lines):
06/13/13 13:14:08 Received TCP
command 60015 (DC_OFF_PEACEFUL) from
unauthenticated@unmapped <172.16.4.103:46020>,
access level ADMINISTRATOR
06/13/13 13:14:08 Calling HandleReq
<handle_off_peaceful()> (0) for command 60015
(DC_OFF_PEACEFUL) from unauthenticated@unmapped
<172.16.4.103:46020>
06/13/13 13:14:08 Got SIGTERM. Performing graceful
shutdown.
06/13/13 13:14:08 Completed DC_SET_PEACEFUL_SHUTDOWN to
local startd
06/13/13 13:14:14 Sent SIGTERM to STARTD (pid 31817)
06/13/13 13:14:14 The STARTD (pid 31817) exited with
status 0
06/13/13 13:14:15 All daemons are gone. Exiting.
Here, we see that the request comes from an authorized
source. However, what the startd sees is subtly different,
as the order is seen as coming from the local machine,
which is not authorized:
StartLog:
06/13/13 13:14:08 Calling Handler
<DaemonCommandProtocol::WaitForSocketData> (2)
06/13/13 13:14:08 PERMISSION DENIED to
unauthenticated@unmapped from host 172.16.6.2 for
command 60016 (DC_SET_PEACEFUL_SHUTDOWN), access level
ADMINISTRATOR: reason: ADMINISTRATOR authorization
policy contains no matching ALLOW entry for this
request; identifiers used for this host:
172.16.6.2,her06-02.hermes.cps.unizar.es,her06-02,
hostname size = 2, original ip address = 172.16.6.2
As it later gets the sigterm:
06/13/13 13:14:14 Got SIGTERM.
Performing graceful shutdown.
06/13/13 13:14:14 shutdown graceful
06/13/13 13:14:14 All resources are free, exiting.
The end result is that we get a graceful shutdown instead
of the peaceful one we asked for.
An obvious workaround is to change:
HOSTALLOW_ADMINISTRATOR =
$(CONDOR_HOST)
to:
HOSTALLOW_ADMINISTRATOR =
$(CONDOR_HOST), $(FULL_HOSTNAME)
But since it's not the default policy, nor there is a
clear reason why this should be so, I think it's more of a
bug. condor_master should somehow authenticate as DAEMON,
or pass on the credentials to startd.
When we do a condor_off -peaceful -daemon stard, however,
everything works as expected since the shutdown command
comes directly from the management machine.
Regards,
Joan
--
--------------------------------------------------------------------------
Joan Josep Piles Contreras - Analista de sistemas
I3A - Instituto de Investigación en Ingeniería de Aragón
Tel: 876 55 51 47 (ext. 845147)
http://i3a.unizar.es -- jpiles@xxxxxxxxx
--------------------------------------------------------------------------
_______________________________________________