I'm running Condor 7.4.2 on Linux, and I'm trying unsuccessfully to
get HDFS running. The HDFS daemon seems to load, but then immediately
exits.
I've set up one machine as the namenode and a number of our cluster
nodes as data nodes. I created and chowned to the Condor user the
HDFS_NAMENODE_DIR and HDFS_DATANODE_DIR directories on these machines.
I left HDFS_DATANODE_ADDRESS = 0.0.0.0:0 because the docs seem to
indicate that it's okay to do so.
Hadoop is version 0.20.2.
When I start the HDFS daemon it loads and exits normally according to
the Masterlog:
04/30 12:08:42 Started process "/lusr/condor/sbin/condor_hdfs", pid
and pgroup = 28706
04/30 12:08:42 The HDFS (pid 28706) exited with status 0
04/30 12:08:42 restarting /lusr/condor/sbin/condor_hdfs in 3600 seconds
HDFS_LOG4J=DEBUG and HDFS_DEBUG=D_ALL
Namenode log shows:
04/30 12:08:42 (fd:3) (pid:28706) NET_REMAP_ENABLE is undefined, using
default value of False
04/30 12:08:42 (fd:3) (pid:28706) LOGS_USE_TIMESTAMP is undefined,
using default value of False
04/30 12:08:42 (fd:3) (pid:28706) config: using subsystem 'HDFS',
local ''
04/30 12:08:42 (fd:3) (pid:28706) Reading from /proc/cpuinfo
04/30 12:08:42 (fd:3) (pid:28706) Found: Physical-IDs:True; Core-IDs:True
04/30 12:08:42 (fd:3) (pid:28706) Analyzing 2 processors using IDs...
04/30 12:08:42 (fd:3) (pid:28706) Looking at processor #0 (PID:0, CID:0):
04/30 12:08:42 (fd:3) (pid:28706) Comparing P#0 and P#1 : pid:0!=0
or cid:0!=1 (match=No)
04/30 12:08:42 (fd:3) (pid:28706) ncpus = 1
04/30 12:08:42 (fd:3) (pid:28706) P0: match->1
04/30 12:08:42 (fd:3) (pid:28706) Looking at processor #1 (PID:0, CID:1):
04/30 12:08:42 (fd:3) (pid:28706) ncpus = 2
04/30 12:08:42 (fd:3) (pid:28706) P1: match->1
04/30 12:08:42 (fd:3) (pid:28706) Using IDs: 2 processors, 2 CPUs, 0 HTs
04/30 12:08:42 (fd:3) (pid:28706) Reading condor configuration from
'/lusr/condor/etc/condor_config'
04/30 12:08:42 (fd:3) (pid:28706) Finding local host information,
calling gethostname()
04/30 12:08:42 (fd:3) (pid:28706) gethostname() returned fully
qualified name "carrion.cs.utexas.edu"
04/30 12:08:42 (fd:3) (pid:28706) NET_REMAP_ENABLE is undefined, using
default value of False
04/30 12:08:42 (fd:3) (pid:28706) PASSWD_CACHE_REFRESH is undefined,
using default value of 319
04/30 12:08:42 (fd:3) (pid:28706) Trying to initialize local IP
address (config file not read)
04/30 12:08:42 (fd:3) (pid:28706) Have not found an IP yet, calling
gethostbyname()
04/30 12:08:42 (fd:3) (pid:28706) Trying to find IP addr for
"carrion.cs.utexas.edu"
04/30 12:08:42 (fd:3) (pid:28706) Calling
gethostbyname(carrion.cs.utexas.edu)
04/30 12:08:42 (fd:3) (pid:28706) Found IP addr in hostent: 128.83.120.7
04/30 12:08:42 (fd:3) (pid:28706) ENABLE_RUNTIME_CONFIG is undefined,
using default value of False
04/30 12:08:42 (fd:3) (pid:28706) ENABLE_PERSISTENT_CONFIG is
undefined, using default value of False
04/30 12:08:42 (fd:3) (pid:28706) Trying to initialize local IP
address (after reading config)
04/30 12:08:42 (fd:3) (pid:28706) NETWORK_INTERFACE not in config
file, using existing value
04/30 12:08:42 (fd:3) (pid:28706) ABORT_ON_EXCEPTION is undefined,
using default value of False
04/30 12:08:42 (fd:3) (pid:28706) Config 'HDFS_LOG': no prefix ==>
'$(LOG)/HDFSLog'
04/30 12:08:42 (fd:3) (pid:28706) Config 'MAX_HDFS_LOG': no prefix ==>
'1000000'
04/30 12:08:42 (fd:3) (pid:28706) PRIV_UNKNOWN --> PRIV_CONDOR at
daemon_core_main.cpp:1835
04/30 12:08:42 (fd:3) (pid:28707) KEYCACHE: created: 0x84bf5b0
04/30 12:08:42 (fd:3) (pid:28707) WANT_UDP_COMMAND_SOCKET is
undefined, using default value of True
04/30 12:08:42 (fd:3) (pid:28707) HDFS_MAX_FILE_DESCRIPTORS is
undefined, using default value of 0
04/30 12:08:42 (fd:3) (pid:28707) MAX_FILE_DESCRIPTORS is undefined,
using default value of 0
04/30 12:08:42 (fd:3) (pid:28707)
******************************************************
04/30 12:08:42 (fd:3) (pid:28707) ** condor_hdfs (CONDOR_HDFS)
STARTING UP
04/30 12:08:42 (fd:3) (pid:28707) **
/lusr/opt/condor-7.4.2/sbin/condor_hdfs
04/30 12:08:42 (fd:3) (pid:28707) ** SubsystemInfo: name=HDFS
type=DAEMON(11) class=DAEMON(1)
04/30 12:08:42 (fd:3) (pid:28707) ** Configuration: subsystem:HDFS
local:<NONE> class:DAEMON
04/30 12:08:42 (fd:3) (pid:28707) ** $CondorVersion: 7.4.2 Mar 29 2010
BuildID: 227044 $
04/30 12:08:42 (fd:3) (pid:28707) ** $CondorPlatform: I386-LINUX_RHEL5 $
04/30 12:08:42 (fd:3) (pid:28707) ** PID = 28707
04/30 12:08:42 (fd:3) (pid:28707) ** Log last touched 4/30 11:53:36
04/30 12:08:42 (fd:3) (pid:28707) ** Running as root: Privilege
switching in effect
04/30 12:08:42 (fd:3) (pid:28707)
******************************************************
04/30 12:08:42 (fd:3) (pid:28707) Using config source:
/lusr/condor/etc/condor_config
04/30 12:08:42 (fd:3) (pid:28707) Using local config sources:
04/30 12:08:42 (fd:3) (pid:28707) /lusr/condor/etc/local/carrion
04/30 12:08:42 (fd:3) (pid:28707) Config 'LOG': no prefix ==>
'$(RELEASE_DIR)/log/$(HOSTNAME)'
04/30 12:08:42 (fd:3) (pid:28707) Running as root. Enabling
specialized core dump routines
04/30 12:08:42 (fd:5) (pid:28707) Setting up command socket
04/30 12:08:42 (fd:5) (pid:28707) CONDOR_INHERIT: "31595
<128.83.120.7:57510> 0 0"
04/30 12:08:42 (fd:5) (pid:28707) Parent PID = 31595
04/30 12:08:42 (fd:5) (pid:28707) Parent Command Sock =
<128.83.120.7:57510>
04/30 12:08:42 (fd:7) (pid:28707) LISTEN <128.83.120.7:45693> fd=5
04/30 12:08:42 (fd:7) (pid:28707)
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/