Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] trying to get HDFS working
- Date: Fri, 30 Apr 2010 12:26:01 -0500
- From: "David A. Kotz" <dkotz@xxxxxxxxxxxxx>
- Subject: [Condor-users] trying to get HDFS working
I'm running Condor 7.4.2 on Linux, and I'm trying unsuccessfully to get
HDFS running. The HDFS daemon seems to load, but then immediately exits.
I've set up one machine as the namenode and a number of our cluster
nodes as data nodes. I created and chowned to the Condor user the
HDFS_NAMENODE_DIR and HDFS_DATANODE_DIR directories on these machines.
I left HDFS_DATANODE_ADDRESS = 0.0.0.0:0 because the docs seem to
indicate that it's okay to do so.
Hadoop is version 0.20.2.
When I start the HDFS daemon it loads and exits normally according to
the Masterlog:
04/30 12:08:42 Started process "/lusr/condor/sbin/condor_hdfs", pid and
pgroup = 28706
04/30 12:08:42 The HDFS (pid 28706) exited with status 0
04/30 12:08:42 restarting /lusr/condor/sbin/condor_hdfs in 3600 seconds
HDFS_LOG4J=DEBUG and HDFS_DEBUG=D_ALL
Namenode log shows:
04/30 12:08:42 (fd:3) (pid:28706) NET_REMAP_ENABLE is undefined, using
default value of False
04/30 12:08:42 (fd:3) (pid:28706) LOGS_USE_TIMESTAMP is undefined, using
default value of False
04/30 12:08:42 (fd:3) (pid:28706) config: using subsystem 'HDFS', local ''
04/30 12:08:42 (fd:3) (pid:28706) Reading from /proc/cpuinfo
04/30 12:08:42 (fd:3) (pid:28706) Found: Physical-IDs:True; Core-IDs:True
04/30 12:08:42 (fd:3) (pid:28706) Analyzing 2 processors using IDs...
04/30 12:08:42 (fd:3) (pid:28706) Looking at processor #0 (PID:0, CID:0):
04/30 12:08:42 (fd:3) (pid:28706) Comparing P#0 and P#1 : pid:0!=0 or
cid:0!=1 (match=No)
04/30 12:08:42 (fd:3) (pid:28706) ncpus = 1
04/30 12:08:42 (fd:3) (pid:28706) P0: match->1
04/30 12:08:42 (fd:3) (pid:28706) Looking at processor #1 (PID:0, CID:1):
04/30 12:08:42 (fd:3) (pid:28706) ncpus = 2
04/30 12:08:42 (fd:3) (pid:28706) P1: match->1
04/30 12:08:42 (fd:3) (pid:28706) Using IDs: 2 processors, 2 CPUs, 0 HTs
04/30 12:08:42 (fd:3) (pid:28706) Reading condor configuration from
'/lusr/condor/etc/condor_config'
04/30 12:08:42 (fd:3) (pid:28706) Finding local host information,
calling gethostname()
04/30 12:08:42 (fd:3) (pid:28706) gethostname() returned fully qualified
name "carrion.cs.utexas.edu"
04/30 12:08:42 (fd:3) (pid:28706) NET_REMAP_ENABLE is undefined, using
default value of False
04/30 12:08:42 (fd:3) (pid:28706) PASSWD_CACHE_REFRESH is undefined,
using default value of 319
04/30 12:08:42 (fd:3) (pid:28706) Trying to initialize local IP address
(config file not read)
04/30 12:08:42 (fd:3) (pid:28706) Have not found an IP yet, calling
gethostbyname()
04/30 12:08:42 (fd:3) (pid:28706) Trying to find IP addr for
"carrion.cs.utexas.edu"
04/30 12:08:42 (fd:3) (pid:28706) Calling
gethostbyname(carrion.cs.utexas.edu)
04/30 12:08:42 (fd:3) (pid:28706) Found IP addr in hostent: 128.83.120.7
04/30 12:08:42 (fd:3) (pid:28706) ENABLE_RUNTIME_CONFIG is undefined,
using default value of False
04/30 12:08:42 (fd:3) (pid:28706) ENABLE_PERSISTENT_CONFIG is undefined,
using default value of False
04/30 12:08:42 (fd:3) (pid:28706) Trying to initialize local IP address
(after reading config)
04/30 12:08:42 (fd:3) (pid:28706) NETWORK_INTERFACE not in config file,
using existing value
04/30 12:08:42 (fd:3) (pid:28706) ABORT_ON_EXCEPTION is undefined, using
default value of False
04/30 12:08:42 (fd:3) (pid:28706) Config 'HDFS_LOG': no prefix ==>
'$(LOG)/HDFSLog'
04/30 12:08:42 (fd:3) (pid:28706) Config 'MAX_HDFS_LOG': no prefix ==>
'1000000'
04/30 12:08:42 (fd:3) (pid:28706) PRIV_UNKNOWN --> PRIV_CONDOR at
daemon_core_main.cpp:1835
04/30 12:08:42 (fd:3) (pid:28707) KEYCACHE: created: 0x84bf5b0
04/30 12:08:42 (fd:3) (pid:28707) WANT_UDP_COMMAND_SOCKET is undefined,
using default value of True
04/30 12:08:42 (fd:3) (pid:28707) HDFS_MAX_FILE_DESCRIPTORS is
undefined, using default value of 0
04/30 12:08:42 (fd:3) (pid:28707) MAX_FILE_DESCRIPTORS is undefined,
using default value of 0
04/30 12:08:42 (fd:3) (pid:28707)
******************************************************
04/30 12:08:42 (fd:3) (pid:28707) ** condor_hdfs (CONDOR_HDFS) STARTING UP
04/30 12:08:42 (fd:3) (pid:28707) ** /lusr/opt/condor-7.4.2/sbin/condor_hdfs
04/30 12:08:42 (fd:3) (pid:28707) ** SubsystemInfo: name=HDFS
type=DAEMON(11) class=DAEMON(1)
04/30 12:08:42 (fd:3) (pid:28707) ** Configuration: subsystem:HDFS
local:<NONE> class:DAEMON
04/30 12:08:42 (fd:3) (pid:28707) ** $CondorVersion: 7.4.2 Mar 29 2010
BuildID: 227044 $
04/30 12:08:42 (fd:3) (pid:28707) ** $CondorPlatform: I386-LINUX_RHEL5 $
04/30 12:08:42 (fd:3) (pid:28707) ** PID = 28707
04/30 12:08:42 (fd:3) (pid:28707) ** Log last touched 4/30 11:53:36
04/30 12:08:42 (fd:3) (pid:28707) ** Running as root: Privilege
switching in effect
04/30 12:08:42 (fd:3) (pid:28707)
******************************************************
04/30 12:08:42 (fd:3) (pid:28707) Using config source:
/lusr/condor/etc/condor_config
04/30 12:08:42 (fd:3) (pid:28707) Using local config sources:
04/30 12:08:42 (fd:3) (pid:28707) /lusr/condor/etc/local/carrion
04/30 12:08:42 (fd:3) (pid:28707) Config 'LOG': no prefix ==>
'$(RELEASE_DIR)/log/$(HOSTNAME)'
04/30 12:08:42 (fd:3) (pid:28707) Running as root. Enabling specialized
core dump routines
04/30 12:08:42 (fd:5) (pid:28707) Setting up command socket
04/30 12:08:42 (fd:5) (pid:28707) CONDOR_INHERIT: "31595
<128.83.120.7:57510> 0 0"
04/30 12:08:42 (fd:5) (pid:28707) Parent PID = 31595
04/30 12:08:42 (fd:5) (pid:28707) Parent Command Sock = <128.83.120.7:57510>
04/30 12:08:42 (fd:7) (pid:28707) LISTEN <128.83.120.7:45693> fd=5
04/30 12:08:42 (fd:7) (pid:28707)