Hadoop2.0 cluster, hbase cluster, zookeeper cluster, hive tool, Sqoop tool, flume tool Building Summary

Last Update:2016-07-15 Source: Internet

Author: User

Tags sqoop log4j

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Software used in the lab development environment:

[[email protected] local]# llTotal320576-rw-r--r--1Root root52550402Mar6 Ten: theapache-flume-1.6. 0-bin. Tar. GZdrwxr-xr-x   7Root root4096Jul the Ten: $flumedrwxr-xr-x. OneRoot root4096JulTen  +:GenevaHadoop-rw-r--r--.1Root root124191203Jul2  One: -hadoop-2.4. 1-x64. Tar. GZdrwxr-xr-x.7Root root4096Jul - Ten: GenevaHbase-rw-r--r--.1Root root79367504Jan +  -: +hbase-0.96. 2-hadoop2-bin. Tar. GZdrwxr-xr-x   9Root root4096Jul the  the: thehive-rw-r--r--1Root root30195232 Dec  A   -hive-0.9. 0. Tar. GZ-rw-r--r--1Root root7412135Jul -  +:Wuyimysql-client-5.1.-1.Glibc23. x86_64. RPM-rw-r--r--.1Root root875336Jan +  -: -mysql-connector-java-5.1.. Jar-rw-r--r--1Root root16775717Jul -  +: -mysql-server-5.1.-1.Glibc23. x86_64. RPMdrwxr-xr-x   9Root root4096Apr -   -sqoop-rw-r--r--1Root root16870735Jul the Ten: atsqoop-1.4. 6. Bin__hadoop-2.0. 4-alpha. Tar. GZ

The specific configuration is as follows:

Hadoop development Environment Cluster Building summary: (a) hadoop2.4.1 cluster construction (non-federated mode): hadoop2.4.1 64-bit cluster environment: HADOOP11 NameNode, SecondaryNameNodehadoop22 ResourceManagerhadoop33 DataNode, NodeManagerhadoop44 DataNode, NodeManagerhadoop55 DataNode, NodeManagerhadoop66 Dat Anode, NodeManager preparation:①> shutdown firewall ②> set static IP address ③> modify hostname ④>ip address and hostname bindings ⑤> set SSH password-free login ⑥> install JDK and configure environment variables Install hadoop2.4.11> extract 2> Modify profile-----------hadoop-env.sh------------------java_home=/usr/local/ JDK-------Core-site.xml----------------------<property >    <name>Fs.defaultfs</name>    <value>hdfs://hadoop11:9000</value></Property ><property >    <name>Hadoop.tmp.dir</name>    <value>/usr/local/hadoop/tmp</value></Property >-------Hdfs-site.xml----------------------<property >          <name>Dfs.replication</name>         <value>3</value>  </Property >   <property >          <name>Dfs.permissions</name>         <value>False</value>  </Property >-------Mapred-site.xml--------------------<property >  <name>Mapreduce.framework.name</name>  <value>Yarn</value></Property >-------Yarn-site.xml----------------------<property >    <name>Yarn.resourcemanager.hostname</name>    <value>Hadoop22</value></Property >   <property >    <name>Yarn.nodemanager.aux-services</name>    <value>Mapreduce_shuffle</value> </Property >------------Slaves------------------------hadoop33hadoop44hadoop55hadoop66------------------------------------------HDFs Namenode-format format 3> start HDFs and yarn cluster start-dfs.shstart-yarn.sh display process: 4334 SecondaryNameNode4781 Jps4614 NodeManager4188 DataNode4074 NameNode4474 ResourceManager (ii) Zookeeper cluster construction: (Quorumpeermain)    Zookeeper cluster corresponding server: HADOOP33, HADOOP44, hadoop552.1 ZK server cluster size is not less than 3 nodes, requires the system time between the servers to be consistent. 2.2 Under the/usr/local directory of HADOOP33, unzip zk....tar.gz, set environment variable 2.3 in conf directory, modify file VI zoo_sample.cfg zoo.cfg 2.4 Edit the file, execute VI zoo.c FG Modified Datadir=/usr/local/zk/data New server.0=hadoop33:2888:3888 Server.1=hadoop    44:2888:3888 server.2=hadoop55:2888:3888 2.5 Create a folder Mkdir/usr/local/zk/data 2.6 in the data directory, create a file myID with a value of 0 2.7 The ZK directory is copied to HADOOP44 and hadoop55 2.8 The corresponding myID value in the HADOOP44 is changed to 1 hadoop55 the corresponding myID value to 2 2.9 start, execute the command on three nodes respectively Zkserver . SH Start 2.10 Test, execute command zkserver.sh status check on three nodes: [[email protected] local]# zkserver.sh statusjmx EnabLed by defaultusing Config:/usr/local/zk/bin/. /conf/zoo.cfgmode:follower[[email protected] data]# zkserver.sh statusjmx enabled by defaultusing Config:/usr/ local/zk/bin/. /conf/zoo.cfgmode:leader[[email protected] data]# zkserver.sh statusjmx enabled by defaultusing Config:/usr/ local/zk/bin/. /conf/zoo.cfgmode:follower (iii) HBase cluster build hbase cluster environment Description: Hmaster HADOOP11, Hadoop22 (active and standby) Hregionserver ha DOOP33, HADOOP44, hadoop55 install hbase1> unzip 2> Modify the configuration file (note the last one)----------------------- hbase-env.sh-------------------------------------Export Java_home=/usr/local/jdkexport hbase_manages_zk= False-----------------------Hbase-site.xml-----------------------------------<property >   <name>Hbase.rootdir</name>   <value>Hdfs://hadoop11:9000/hbase</value></Property ><property >   <name>hbase.cluster.distributed</name>   <value>True</value></Property ><property >   <name>Hbase.zookeeper.quorum</name>   <value>hadoop33:2181,hadoop44:2181,hadoop55:2181</value></Property ><property >   <name>Dfs.replication</name>   <value>3</value></Property >----------------------regionservers-------------------------------------hadoop33hadoop44hadoop55*************** Does not involve Hmaster-related configuration ***************----------------------- Because data in hbase is stored in HDFs----------------------hdfs-site.xml and Core-site.xml of Hadoop (HDFS) are placed under hbase/conf 3, Before starting hbasestart-hbase.sh****** to start HBase, make sure that Hadoop is working properly and can write to file ************* before you start HBase, make sure the ZK cluster is started ***************** The location of the Hmaster is not configured in the configuration file, and the result is which node starts HBase and which node is hmaster******* ****** HBase can start multiple hmaster by hbase-daemon.sh start Master, which is redundant standby status ****** To view the startup process:------------------------------------------------------------------------hmasterhregionserver---------------------- --------------------------------------------------If you are using your own ZK instance (true), the process shown after JPS is hquorumpeer if you are not using your own ZK instance (false) , the process shown after JPS is quorumpeermain using the browser access HTTP://HADOOP11 (hmaster): 60010. (iv) HIVE tool construction Use NOTE: Hive itself is a client tool, no distribution and pseudo-distribution of 1, decompression, renaming, Set environment Variables-----------------------------------------------------------------------------------------------------2, AnnInstall MySQL---until a remote connection can be made via Navicate (1) Execute command service MySQL status and rpm-qa |grep-i mysql command check if MySQL (2) executes command rpm   -e xxxxxxx--nodeps Remove the installed MySQL (3) Execute command service MySQL status and Rpm-qa |grep-i MySQL check whether to remove clean (4) Execute command rpm-i mysql-server-******** (--nodeps--force) installation Server (5) Execute command mysqld_safe & start MySQL server (6) Execute command ser Vice MySQL status check whether MySQL server is started (7) Execute command rpm-i mysql-client-******** install MySQL client (8) Execute command mysql_secure_  Installation set MySQL client root login password "3 N 1 Y" (9) Execute command mysql-uroot-padmin log on to the MySQL client (10) Execute command grant all on *. *  To ' root ' @ '% ' of ' identified by ' admin ';  {The first * is a hive} flush privileges; Enables MySQL to connect remotely (11) and place MySQL JDBC driver into Hive's lib directory *******!!!!!!!!!!!!!!! -----------------------------------------------------------------------------------------------------3. Modify the configuration file ( 1) Modify the HADOOP configuration file hadoop-env.sh, modify the content as follows: (Hadoop2.0 did not configure this item) export hadoop_classpath=.: $CLASSPATH: $HADOOP _classpaTH: $HADOOP _home/bin (2) under directory $hive_home/bin, modify the file hive-config.sh to add the following: Export JAVA_HOME=/USR/LOCAL/JDK Expo RT hive_home=/usr/local/hive Export Hadoop_home=/usr/local/hadoop (3) under Directory $hive_home/conf/, will be hive-env.sh.        Template, hive-default.xml.template, hive-log4j.properties rename modify hive_env.sh (1) 1, add hadoop_home installation directory address Modify Hive-log4j.properties (1) 1, Log4j.appender.EventCounter value modified to Org.apache.hadoop.log.metrics.EventCounter modified     Configuration file Hive-site.xml: (4)/****** the host name of the machine on which the MySQL is installed, not all the host name of the Boss is written hadoop*******/<property >        <name>Javax.jdo.option.ConnectionURL</name>        <value>Jdbc:mysql://hadoop11:3306/hive?createdatabaseifnotexist=true</value>/Note: The path is hive/</Property >    <property >        <name>Javax.jdo.option.ConnectionDriverName</name>        <value>Com.mysql.jdbc.Driver</value>    </Property >    <property >        <name>Javax.jdo.option.ConnectionUserName</name>        <value>Root</value>    </Property >    <property >        <name>Javax.jdo.option.ConnectionPassword</name>        <value>Admin</value>    </Property >         <property >            <name>Hive.metastore.warehouse.dir</name>            <value>/hive</value>/*** set the working directory of hive in HDFs ****/</Property >-----------------------------------------------------------------------------------------------------------4, Launch Hive tool Hive Validation: Create a table in hive and see if metadata information is available in the TBLs table in MySQL (shell+navicate) sqoop tools are built using Sqoop is just a tool, The concept of distribution and pseudo-distribution Sqoop installation (very simple): 1, Unzip 2, rename 3, configure the environment variable 4, Source/etc/profile 5, put the MySQL driver into the sqoop Lib directory ok! (vi) Construction of SQOOP tools using Flume configuration without cluster, non-cluster one said flume configuration:1> Extract 2> rename 3> modify environment variables,source/etc/profile4> Change flume-env.sh Add java_home4>**************** write the configuration file and add it to the Conf directory ************# Name the components in this agenta1.sources = R1a1.sinks = K1a1.channels = c1# describe/configure the Sourcea1.sources.r1.type = SPOOLDIRA1.SOURCES.R 1.spoolDir =/usr/local/datainputa1.sources.r1.fileheader = Truea1.sources.r1.interceptors = I1a1.sources.r1.interceptors.i1.type = timestamp# Describe the sink# Describe the Sinka1.sinks.k1.type = Hdfsa1.sinks.k1 . Hdfs.path = Hdfs://hadoop11:9000/dataoutputa1.sinks.k1.hdfs.writeformat = Texta1.sinks.k1.hdfs.fileType = DataStreama1.sinks.k1.hdfs.rollInterval = 10a1. sinks.k1.hdfs.rollSize = 0a1.sinks.k1.hdfs.rollcount = 0a1.sinks.k1.hdfs.fileprefix =%y-%m-%d-%h-%m-% Sa1.sinks.k1.hdfs.useLocalTimeStamp = true# use a channel which buffers events in Filea1.channels.c1.type = Filea1.channel S.c1.checkpointdir =/usr/flume/checkpointa1.channels.c1.datadirs =/usr/flume/data# Bind The source and sink to the Chann Ela1.sources.r1.channels = C1a1.sinks.k1.channel = c15> Execute command bin/flume-ng agent-n agent1-c conf-f conf/baby-dflum E.root.logger=debug,console can be executed.

If you have any questions, please leave a message!

Hadoop2.0 cluster, hbase cluster, zookeeper cluster, hive tool, Sqoop tool, flume tool Building Summary

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More