Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] trying to get HDFS working
- Date: Wed, 6 Oct 2010 06:44:00 -0400
- From: Mag Gam <magawake@xxxxxxxxx>
- Subject: Re: [Condor-users] trying to get HDFS working
We are on the same page as Dave. Any help?
Is there a tutorial/wiki page on how to get this setup, instead of
using, http://www.cs.wisc.edu/condor/manual/v7.5/3_3Configuration.html#SECTION004323000000000000000
On Mon, May 10, 2010 at 10:51 AM, David Kotz <dkotz@xxxxxxxxxxxxx> wrote:
> Any suggestions from the Condor team on what might be wrong or where
> else I might look?
>
> - dave
>
>
> On Fri, 2010-04-30 at 14:10 -0500, David A. Kotz wrote:
>> Thanks, Dan, but I think I have all of those covered. I corrected the
>> HDFS setting:
>>
>> HDFS = $(SBIN)/condor_hdfs
>>
>> pointed JAVA at our Java 6 install, and pointed HDFS_HOME at our
>> existing Hadoop install. I also have HDFS_NAMENODE set to the machine
>> with HDFS_SERVICES = HDFS_NAMENODE, and I have HDFS_SERVICES =
>> HDFS_DATANODE on the other machines. I also changed the
>> DedicatedScheduler to point to the namenode as well, because I think I
>> ran across something which seemed to indicate I should.
>>
>> - dave
>>
>>
>> Dan Bradley wrote:
>> >
>> > I have also tried to get Condor's HDFS support to work. I haven't quite
>> > finished, but what I found may be helpful to you.
>> >
>> > In my case, I found that the condor package did not contain the
>> > necessary HDFS jar files. I had to download these and install them in
>> > condor's libexec/hdfs/lib directory. I used the 0.20.2 hadoop release.
>> >
>> > I also found that the version of java on my system (gij (GNU libgcj)
>> > version 4.1.2) did not appear to work with HDFS. Instead, I used
>> > jdk1.6.0_20 from Sun.
>> >
>> > I also found that the documentation for HDFS_SERVICES is confusing. It
>> > appears that it is supposed to be set equal to either HDFS_NAMENODE or
>> > HDFS_DATANODE.
>> >
>> > Hope that helps.
>> >
>> > --Dan
>> >
>> > David A. Kotz wrote:
>> >> I'm running Condor 7.4.2 on Linux, and I'm trying unsuccessfully to
>> >> get HDFS running. The HDFS daemon seems to load, but then immediately
>> >> exits.
>> >>
>> >> I've set up one machine as the namenode and a number of our cluster
>> >> nodes as data nodes. I created and chowned to the Condor user the
>> >> HDFS_NAMENODE_DIR and HDFS_DATANODE_DIR directories on these machines.
>> >> I left HDFS_DATANODE_ADDRESS = 0.0.0.0:0 because the docs seem to
>> >> indicate that it's okay to do so.
>> >>
>> >> Hadoop is version 0.20.2.
>> >>
>> >> When I start the HDFS daemon it loads and exits normally according to
>> >> the Masterlog:
>> >>
>> >> 04/30 12:08:42 Started process "/lusr/condor/sbin/condor_hdfs", pid
>> >> and pgroup = 28706
>> >> 04/30 12:08:42 The HDFS (pid 28706) exited with status 0
>> >> 04/30 12:08:42 restarting /lusr/condor/sbin/condor_hdfs in 3600 seconds
>> >>
>> >>
>> >> HDFS_LOG4J=DEBUG and HDFS_DEBUG=D_ALL
>> >>
>> >>
>> >> Namenode log shows:
>> >>
>> >> 04/30 12:08:42 (fd:3) (pid:28706) NET_REMAP_ENABLE is undefined, using
>> >> default value of False
>> >> 04/30 12:08:42 (fd:3) (pid:28706) LOGS_USE_TIMESTAMP is undefined,
>> >> using default value of False
>> >> 04/30 12:08:42 (fd:3) (pid:28706) config: using subsystem 'HDFS',
>> >> local ''
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Reading from /proc/cpuinfo
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Found: Physical-IDs:True; Core-IDs:True
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Analyzing 2 processors using IDs...
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Looking at processor #0 (PID:0, CID:0):
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Comparing P#0 and P#1 : pid:0!=0
>> >> or cid:0!=1 (match=No)
>> >> 04/30 12:08:42 (fd:3) (pid:28706) ncpus = 1
>> >> 04/30 12:08:42 (fd:3) (pid:28706) P0: match->1
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Looking at processor #1 (PID:0, CID:1):
>> >> 04/30 12:08:42 (fd:3) (pid:28706) ncpus = 2
>> >> 04/30 12:08:42 (fd:3) (pid:28706) P1: match->1
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Using IDs: 2 processors, 2 CPUs, 0 HTs
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Reading condor configuration from
>> >> '/lusr/condor/etc/condor_config'
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Finding local host information,
>> >> calling gethostname()
>> >> 04/30 12:08:42 (fd:3) (pid:28706) gethostname() returned fully
>> >> qualified name "carrion.cs.utexas.edu"
>> >> 04/30 12:08:42 (fd:3) (pid:28706) NET_REMAP_ENABLE is undefined, using
>> >> default value of False
>> >> 04/30 12:08:42 (fd:3) (pid:28706) PASSWD_CACHE_REFRESH is undefined,
>> >> using default value of 319
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Trying to initialize local IP
>> >> address (config file not read)
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Have not found an IP yet, calling
>> >> gethostbyname()
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Trying to find IP addr for
>> >> "carrion.cs.utexas.edu"
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Calling
>> >> gethostbyname(carrion.cs.utexas.edu)
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Found IP addr in hostent: 128.83.120.7
>> >> 04/30 12:08:42 (fd:3) (pid:28706) ENABLE_RUNTIME_CONFIG is undefined,
>> >> using default value of False
>> >> 04/30 12:08:42 (fd:3) (pid:28706) ENABLE_PERSISTENT_CONFIG is
>> >> undefined, using default value of False
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Trying to initialize local IP
>> >> address (after reading config)
>> >> 04/30 12:08:42 (fd:3) (pid:28706) NETWORK_INTERFACE not in config
>> >> file, using existing value
>> >> 04/30 12:08:42 (fd:3) (pid:28706) ABORT_ON_EXCEPTION is undefined,
>> >> using default value of False
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Config 'HDFS_LOG': no prefix ==>
>> >> '$(LOG)/HDFSLog'
>> >> 04/30 12:08:42 (fd:3) (pid:28706) Config 'MAX_HDFS_LOG': no prefix ==>
>> >> '1000000'
>> >> 04/30 12:08:42 (fd:3) (pid:28706) PRIV_UNKNOWN --> PRIV_CONDOR at
>> >> daemon_core_main.cpp:1835
>> >> 04/30 12:08:42 (fd:3) (pid:28707) KEYCACHE: created: 0x84bf5b0
>> >> 04/30 12:08:42 (fd:3) (pid:28707) WANT_UDP_COMMAND_SOCKET is
>> >> undefined, using default value of True
>> >> 04/30 12:08:42 (fd:3) (pid:28707) HDFS_MAX_FILE_DESCRIPTORS is
>> >> undefined, using default value of 0
>> >> 04/30 12:08:42 (fd:3) (pid:28707) MAX_FILE_DESCRIPTORS is undefined,
>> >> using default value of 0
>> >> 04/30 12:08:42 (fd:3) (pid:28707)
>> >> ******************************************************
>> >> 04/30 12:08:42 (fd:3) (pid:28707) ** condor_hdfs (CONDOR_HDFS)
>> >> STARTING UP
>> >> 04/30 12:08:42 (fd:3) (pid:28707) **
>> >> /lusr/opt/condor-7.4.2/sbin/condor_hdfs
>> >> 04/30 12:08:42 (fd:3) (pid:28707) ** SubsystemInfo: name=HDFS
>> >> type=DAEMON(11) class=DAEMON(1)
>> >> 04/30 12:08:42 (fd:3) (pid:28707) ** Configuration: subsystem:HDFS
>> >> local:<NONE> class:DAEMON
>> >> 04/30 12:08:42 (fd:3) (pid:28707) ** $CondorVersion: 7.4.2 Mar 29 2010
>> >> BuildID: 227044 $
>> >> 04/30 12:08:42 (fd:3) (pid:28707) ** $CondorPlatform: I386-LINUX_RHEL5 $
>> >> 04/30 12:08:42 (fd:3) (pid:28707) ** PID = 28707
>> >> 04/30 12:08:42 (fd:3) (pid:28707) ** Log last touched 4/30 11:53:36
>> >> 04/30 12:08:42 (fd:3) (pid:28707) ** Running as root: Privilege
>> >> switching in effect
>> >> 04/30 12:08:42 (fd:3) (pid:28707)
>> >> ******************************************************
>> >> 04/30 12:08:42 (fd:3) (pid:28707) Using config source:
>> >> /lusr/condor/etc/condor_config
>> >> 04/30 12:08:42 (fd:3) (pid:28707) Using local config sources:
>> >> 04/30 12:08:42 (fd:3) (pid:28707) /lusr/condor/etc/local/carrion
>> >> 04/30 12:08:42 (fd:3) (pid:28707) Config 'LOG': no prefix ==>
>> >> '$(RELEASE_DIR)/log/$(HOSTNAME)'
>> >> 04/30 12:08:42 (fd:3) (pid:28707) Running as root. Enabling
>> >> specialized core dump routines
>> >> 04/30 12:08:42 (fd:5) (pid:28707) Setting up command socket
>> >> 04/30 12:08:42 (fd:5) (pid:28707) CONDOR_INHERIT: "31595
>> >> <128.83.120.7:57510> 0 0"
>> >> 04/30 12:08:42 (fd:5) (pid:28707) Parent PID = 31595
>> >> 04/30 12:08:42 (fd:5) (pid:28707) Parent Command Sock =
>> >> <128.83.120.7:57510>
>> >> 04/30 12:08:42 (fd:7) (pid:28707) LISTEN <128.83.120.7:45693> fd=5
>> >> 04/30 12:08:42 (fd:7) (pid:28707)
>> >> _______________________________________________
>> >> Condor-users mailing list
>> >> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> >> subject: Unsubscribe
>> >> You can also unsubscribe by visiting
>> >> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>> >>
>> >> The archives can be found at:
>> >> https://lists.cs.wisc.edu/archive/condor-users/
>> > _______________________________________________
>> > Condor-users mailing list
>> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> > subject: Unsubscribe
>> > You can also unsubscribe by visiting
>> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>> >
>> > The archives can be found at:
>> > https://lists.cs.wisc.edu/archive/condor-users/
>>
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>