Hi, I’ve set COLLECTOR_DEBUG to D_SECURITY but /var/log/condor/CollectorLog doesn’t contain any PERMISSION DENIED lines. There is an entry in the collector log though saying that my external/invisible node is granted READ level access (which explains why I can see
my Condor pool on this node even though the not itself is not being listed): 09/18/14 10:47:57 PERMISSION GRANTED to unauthenticated@unmapped from host <IP> for command 5 (QUERY_STARTD_ADS), access level READ: reason: READ
authorization policy allows IP address <IP>; identifiers used for this remote host <IP>, <HOSTNAME> But I can’t find a corresponding line for WRITE level access. Actually, there is no such line even for the other node which do show up in the condor_status
list… So how do I verify that a given node has been given WRITE access by the collector? What could be the reason for this behavior? I don’t know if this is important but I’m using the old Condor security concept by defining HOSTALLOW_WRITE. IT department has assured me that
the required ports (9614/9618 = shared/collector) are open. “nc -zv IP 9614” returns “… open” (collector -> invisible node). Best regards, Lukas Von: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx]
Im Auftrag von Brian Bockelman On Sep 9, 2014, at 7:49 AM, Lukas Koschmieder <Lukas.Koschmieder@xxxxxxxxxxxxxxxxxxx> wrote:
Hi, Some of my STARTD/SCHEED nodes don’t show up in the condor_status list. This probably has something to do with the fact that these nodes belong to a different network. 1) Do I have to use the flocking mechanism in order add such an “external node” (see setup below)? 2) If I do not have to use the flocking mechanism then how do I track down the error? I’ve already checked all the logs (on both the invisible nodes as well as on the
collector) but I can’t find anything clue. This is how my pool is set up: FOO network: Collector/Negotiator: condor.FOO.my.com (Debian 6) “Internal” Startd/Scheed nodes: start1.FOO.my.com (CentOS 6) <- LISTED start2.FOO.my.com (Windows 7) <- LISTED BAR network: “External” Startd/Scheed node: start3.BAR.my.com (OpenSuse 13) <- NOT LISTED start4.BAR.my.com (Windows 7) <- NOT LISTED Collector/Negotiator condor_config.local: CONDOR_HOST = condor.FOO.my.com DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR ALLOW_WRITE = *.FOO.my.com, *.BAR.my.com SGE_GAHP = $(GLITE_LOCATION)/bin/batch_gahp GLIDEIN_SITES = *.FOO.my.com HOSTALLOW_WRITE = $(HOSTALLOW_WRITE), $(GLIDEIN_SITES) Since HOSTALLOW_WRITE, if used, overrides ALLOW_WRITE, this is the relevant line to look at. I think you want: HOSTALLOW_WRITE = $(HOSTALLOW_WRITE), $(GLIDEIN_SITES), *.BAR.my.com ? If it's a simple copy/paste error, a few other ideas: - Look at /var/log/condor/CollectorLog and look for PERMISSION DENIED lines. They often give a good hint as to what went wrong. - Restart the collector with D_SECURITY set for COLLECTOR_DEBUG. This will greatly increase the verbosity but also give more hints as to what is gone wrong. Finally, this is not a particularly secure setup - host-based security may not be secure as you would want, especially as you start to involve multiple networks. I can't find any good links at the moment, but maybe others could chime in? Brian
USE_SHARED_PORT = TRUE SHARED_PORT_ARGS = -p 9614 DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT Startd/Scheed condor_config.local: CONDOR_HOST = condor.FOO.my.com DAEMON_LIST = MASTER, STARTD, SCHEDD ALLOW_WRITE = *.FOO.my.com, *.BAR.my.com SGE_GAHP = $(GLITE_LOCATION)/bin/batch_gahp GLIDEIN_SITES = *.FOO.my.com HOSTALLOW_WRITE = $(HOSTALLOW_WRITE), $(GLIDEIN_SITES) USE_SHARED_PORT = TRUE SHARED_PORT_ARGS = -p 9614 DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT START = TRUE Best regards, Lukas _______________________________________________ |