[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Nodes missing in condor_status list




On Sep 9, 2014, at 7:49 AM, Lukas Koschmieder <Lukas.Koschmieder@xxxxxxxxxxxxxxxxxxx> wrote:

Hi,
 
Some of my STARTD/SCHEED nodes don’t show up in the condor_status list.
This probably has something to do with the fact that these nodes belong to a different network.
1) Do I have to use the flocking mechanism in order add such an “external node” (see setup below)?
2) If I do not have to use the flocking mechanism then how do I track down the error? I’ve already checked all the logs (on both the invisible nodes as well as on the collector) but I can’t find anything clue.
 
 
This is how my pool is set up:
 
FOO network:
 
Collector/Negotiator:
condor.FOO.my.com (Debian 6)
 
“Internal” Startd/Scheed nodes:
start1.FOO.my.com (CentOS 6)  <- LISTED
start2.FOO.my.com (Windows 7) <- LISTED
 
BAR network:
 
“External” Startd/Scheed node:
start3.BAR.my.com (OpenSuse 13) <- NOT LISTED
start4.BAR.my.com (Windows 7)   <- NOT LISTED
 
 
 
Collector/Negotiator condor_config.local:
 
CONDOR_HOST = condor.FOO.my.com
DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR
ALLOW_WRITE = *.FOO.my.com, *.BAR.my.com
 
SGE_GAHP      = $(GLITE_LOCATION)/bin/batch_gahp
GLIDEIN_SITES = *.FOO.my.com
 
HOSTALLOW_WRITE = $(HOSTALLOW_WRITE), $(GLIDEIN_SITES)

Since HOSTALLOW_WRITE, if used, overrides ALLOW_WRITE, this is the relevant line to look at.

I think you want:

HOSTALLOW_WRITE = $(HOSTALLOW_WRITE), $(GLIDEIN_SITES), *.BAR.my.com

?

If it's a simple copy/paste error, a few other ideas:

- Look at /var/log/condor/CollectorLog and look for PERMISSION DENIED lines.  They often give a good hint as to what went wrong.
- Restart the collector with D_SECURITY set for COLLECTOR_DEBUG.  This will greatly increase the verbosity but also give more hints as to what is gone wrong.

Finally, this is not a particularly secure setup - host-based security may not be secure as you would want, especially as you start to involve multiple networks.  I can't find any good links at the moment, but maybe others could chime in?

Brian

 
USE_SHARED_PORT  = TRUE
SHARED_PORT_ARGS = -p 9614
DAEMON_LIST      = $(DAEMON_LIST), SHARED_PORT
 
 
 
Startd/Scheed condor_config.local:
 
CONDOR_HOST = condor.FOO.my.com
DAEMON_LIST = MASTER, STARTD, SCHEDD
ALLOW_WRITE = *.FOO.my.com, *.BAR.my.com
 
SGE_GAHP      = $(GLITE_LOCATION)/bin/batch_gahp
GLIDEIN_SITES = *.FOO.my.com
 
HOSTALLOW_WRITE = $(HOSTALLOW_WRITE), $(GLIDEIN_SITES)
 
USE_SHARED_PORT  = TRUE
SHARED_PORT_ARGS = -p 9614
DAEMON_LIST      = $(DAEMON_LIST), SHARED_PORT
 
START = TRUE
 
 
 
Best regards,
Lukas
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/