Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] trying to get HDFS working
- Date: Mon, 10 May 2010 09:51:40 -0500
- From: David Kotz <dkotz@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] trying to get HDFS working
Any suggestions from the Condor team on what might be wrong or where
else I might look?
- dave
On Fri, 2010-04-30 at 14:10 -0500, David A. Kotz wrote:
> Thanks, Dan, but I think I have all of those covered. I corrected the
> HDFS setting:
>
> HDFS = $(SBIN)/condor_hdfs
>
> pointed JAVA at our Java 6 install, and pointed HDFS_HOME at our
> existing Hadoop install. I also have HDFS_NAMENODE set to the machine
> with HDFS_SERVICES = HDFS_NAMENODE, and I have HDFS_SERVICES =
> HDFS_DATANODE on the other machines. I also changed the
> DedicatedScheduler to point to the namenode as well, because I think I
> ran across something which seemed to indicate I should.
>
> - dave
>
>
> Dan Bradley wrote:
> >
> > I have also tried to get Condor's HDFS support to work. I haven't quite
> > finished, but what I found may be helpful to you.
> >
> > In my case, I found that the condor package did not contain the
> > necessary HDFS jar files. I had to download these and install them in
> > condor's libexec/hdfs/lib directory. I used the 0.20.2 hadoop release.
> >
> > I also found that the version of java on my system (gij (GNU libgcj)
> > version 4.1.2) did not appear to work with HDFS. Instead, I used
> > jdk1.6.0_20 from Sun.
> >
> > I also found that the documentation for HDFS_SERVICES is confusing. It
> > appears that it is supposed to be set equal to either HDFS_NAMENODE or
> > HDFS_DATANODE.
> >
> > Hope that helps.
> >
> > --Dan
> >
> > David A. Kotz wrote:
> >> I'm running Condor 7.4.2 on Linux, and I'm trying unsuccessfully to
> >> get HDFS running. The HDFS daemon seems to load, but then immediately
> >> exits.
> >>
> >> I've set up one machine as the namenode and a number of our cluster
> >> nodes as data nodes. I created and chowned to the Condor user the
> >> HDFS_NAMENODE_DIR and HDFS_DATANODE_DIR directories on these machines.
> >> I left HDFS_DATANODE_ADDRESS = 0.0.0.0:0 because the docs seem to
> >> indicate that it's okay to do so.
> >>
> >> Hadoop is version 0.20.2.
> >>
> >> When I start the HDFS daemon it loads and exits normally according to
> >> the Masterlog:
> >>
> >> 04/30 12:08:42 Started process "/lusr/condor/sbin/condor_hdfs", pid
> >> and pgroup = 28706
> >> 04/30 12:08:42 The HDFS (pid 28706) exited with status 0
> >> 04/30 12:08:42 restarting /lusr/condor/sbin/condor_hdfs in 3600 seconds
> >>
> >>
> >> HDFS_LOG4J=DEBUG and HDFS_DEBUG=D_ALL
> >>
> >>
> >> Namenode log shows:
> >>
> >> 04/30 12:08:42 (fd:3) (pid:28706) NET_REMAP_ENABLE is undefined, using
> >> default value of False
> >> 04/30 12:08:42 (fd:3) (pid:28706) LOGS_USE_TIMESTAMP is undefined,
> >> using default value of False
> >> 04/30 12:08:42 (fd:3) (pid:28706) config: using subsystem 'HDFS',
> >> local ''
> >> 04/30 12:08:42 (fd:3) (pid:28706) Reading from /proc/cpuinfo
> >> 04/30 12:08:42 (fd:3) (pid:28706) Found: Physical-IDs:True; Core-IDs:True
> >> 04/30 12:08:42 (fd:3) (pid:28706) Analyzing 2 processors using IDs...
> >> 04/30 12:08:42 (fd:3) (pid:28706) Looking at processor #0 (PID:0, CID:0):
> >> 04/30 12:08:42 (fd:3) (pid:28706) Comparing P#0 and P#1 : pid:0!=0
> >> or cid:0!=1 (match=No)
> >> 04/30 12:08:42 (fd:3) (pid:28706) ncpus = 1
> >> 04/30 12:08:42 (fd:3) (pid:28706) P0: match->1
> >> 04/30 12:08:42 (fd:3) (pid:28706) Looking at processor #1 (PID:0, CID:1):
> >> 04/30 12:08:42 (fd:3) (pid:28706) ncpus = 2
> >> 04/30 12:08:42 (fd:3) (pid:28706) P1: match->1
> >> 04/30 12:08:42 (fd:3) (pid:28706) Using IDs: 2 processors, 2 CPUs, 0 HTs
> >> 04/30 12:08:42 (fd:3) (pid:28706) Reading condor configuration from
> >> '/lusr/condor/etc/condor_config'
> >> 04/30 12:08:42 (fd:3) (pid:28706) Finding local host information,
> >> calling gethostname()
> >> 04/30 12:08:42 (fd:3) (pid:28706) gethostname() returned fully
> >> qualified name "carrion.cs.utexas.edu"
> >> 04/30 12:08:42 (fd:3) (pid:28706) NET_REMAP_ENABLE is undefined, using
> >> default value of False
> >> 04/30 12:08:42 (fd:3) (pid:28706) PASSWD_CACHE_REFRESH is undefined,
> >> using default value of 319
> >> 04/30 12:08:42 (fd:3) (pid:28706) Trying to initialize local IP
> >> address (config file not read)
> >> 04/30 12:08:42 (fd:3) (pid:28706) Have not found an IP yet, calling
> >> gethostbyname()
> >> 04/30 12:08:42 (fd:3) (pid:28706) Trying to find IP addr for
> >> "carrion.cs.utexas.edu"
> >> 04/30 12:08:42 (fd:3) (pid:28706) Calling
> >> gethostbyname(carrion.cs.utexas.edu)
> >> 04/30 12:08:42 (fd:3) (pid:28706) Found IP addr in hostent: 128.83.120.7
> >> 04/30 12:08:42 (fd:3) (pid:28706) ENABLE_RUNTIME_CONFIG is undefined,
> >> using default value of False
> >> 04/30 12:08:42 (fd:3) (pid:28706) ENABLE_PERSISTENT_CONFIG is
> >> undefined, using default value of False
> >> 04/30 12:08:42 (fd:3) (pid:28706) Trying to initialize local IP
> >> address (after reading config)
> >> 04/30 12:08:42 (fd:3) (pid:28706) NETWORK_INTERFACE not in config
> >> file, using existing value
> >> 04/30 12:08:42 (fd:3) (pid:28706) ABORT_ON_EXCEPTION is undefined,
> >> using default value of False
> >> 04/30 12:08:42 (fd:3) (pid:28706) Config 'HDFS_LOG': no prefix ==>
> >> '$(LOG)/HDFSLog'
> >> 04/30 12:08:42 (fd:3) (pid:28706) Config 'MAX_HDFS_LOG': no prefix ==>
> >> '1000000'
> >> 04/30 12:08:42 (fd:3) (pid:28706) PRIV_UNKNOWN --> PRIV_CONDOR at
> >> daemon_core_main.cpp:1835
> >> 04/30 12:08:42 (fd:3) (pid:28707) KEYCACHE: created: 0x84bf5b0
> >> 04/30 12:08:42 (fd:3) (pid:28707) WANT_UDP_COMMAND_SOCKET is
> >> undefined, using default value of True
> >> 04/30 12:08:42 (fd:3) (pid:28707) HDFS_MAX_FILE_DESCRIPTORS is
> >> undefined, using default value of 0
> >> 04/30 12:08:42 (fd:3) (pid:28707) MAX_FILE_DESCRIPTORS is undefined,
> >> using default value of 0
> >> 04/30 12:08:42 (fd:3) (pid:28707)
> >> ******************************************************
> >> 04/30 12:08:42 (fd:3) (pid:28707) ** condor_hdfs (CONDOR_HDFS)
> >> STARTING UP
> >> 04/30 12:08:42 (fd:3) (pid:28707) **
> >> /lusr/opt/condor-7.4.2/sbin/condor_hdfs
> >> 04/30 12:08:42 (fd:3) (pid:28707) ** SubsystemInfo: name=HDFS
> >> type=DAEMON(11) class=DAEMON(1)
> >> 04/30 12:08:42 (fd:3) (pid:28707) ** Configuration: subsystem:HDFS
> >> local:<NONE> class:DAEMON
> >> 04/30 12:08:42 (fd:3) (pid:28707) ** $CondorVersion: 7.4.2 Mar 29 2010
> >> BuildID: 227044 $
> >> 04/30 12:08:42 (fd:3) (pid:28707) ** $CondorPlatform: I386-LINUX_RHEL5 $
> >> 04/30 12:08:42 (fd:3) (pid:28707) ** PID = 28707
> >> 04/30 12:08:42 (fd:3) (pid:28707) ** Log last touched 4/30 11:53:36
> >> 04/30 12:08:42 (fd:3) (pid:28707) ** Running as root: Privilege
> >> switching in effect
> >> 04/30 12:08:42 (fd:3) (pid:28707)
> >> ******************************************************
> >> 04/30 12:08:42 (fd:3) (pid:28707) Using config source:
> >> /lusr/condor/etc/condor_config
> >> 04/30 12:08:42 (fd:3) (pid:28707) Using local config sources:
> >> 04/30 12:08:42 (fd:3) (pid:28707) /lusr/condor/etc/local/carrion
> >> 04/30 12:08:42 (fd:3) (pid:28707) Config 'LOG': no prefix ==>
> >> '$(RELEASE_DIR)/log/$(HOSTNAME)'
> >> 04/30 12:08:42 (fd:3) (pid:28707) Running as root. Enabling
> >> specialized core dump routines
> >> 04/30 12:08:42 (fd:5) (pid:28707) Setting up command socket
> >> 04/30 12:08:42 (fd:5) (pid:28707) CONDOR_INHERIT: "31595
> >> <128.83.120.7:57510> 0 0"
> >> 04/30 12:08:42 (fd:5) (pid:28707) Parent PID = 31595
> >> 04/30 12:08:42 (fd:5) (pid:28707) Parent Command Sock =
> >> <128.83.120.7:57510>
> >> 04/30 12:08:42 (fd:7) (pid:28707) LISTEN <128.83.120.7:45693> fd=5
> >> 04/30 12:08:42 (fd:7) (pid:28707)
> >> _______________________________________________
> >> Condor-users mailing list
> >> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> >> subject: Unsubscribe
> >> You can also unsubscribe by visiting
> >> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >>
> >> The archives can be found at:
> >> https://lists.cs.wisc.edu/archive/condor-users/
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/