Software used in the lab development environment:
[[email protected] local]# llTotal320576-rw-r--r--1Root root52550402Mar6 Ten: theapache-flume-1.6. 0-bin. Tar. GZdrwxr-xr-x 7Root root4096Jul the Ten: $flumedrwxr-xr-x. OneRoot root4096JulTen +:GenevaHadoop-rw-r--r--.1Root root124191203Jul2 One: -hadoop-2.4. 1-x64. Tar. GZdrwxr-xr-x.7Root root4096Jul - Ten: GenevaHbase-rw-r--r--.1Root root79367504Jan + -: +hbase-0.96. 2-hadoop2-bin. Tar. GZdrwxr-xr-x 9Root root4096Jul the the: thehive-rw-r--r--1Root root30195232 Dec A -hive-0.9. 0. Tar. GZ-rw-r--r--1Root root7412135Jul - +:Wuyimysql-client-5.1.-1.Glibc23. x86_64. RPM-rw-r--r--.1Root root875336Jan + -: -mysql-connector-java-5.1.. Jar-rw-r--r--1Root root16775717Jul - +: -mysql-server-5.1.-1.Glibc23. x86_64. RPMdrwxr-xr-x 9Root root4096Apr - -sqoop-rw-r--r--1Root root16870735Jul the Ten: atsqoop-1.4. 6. Bin__hadoop-2.0. 4-alpha. Tar. GZ
The specific configuration is as follows:
Hadoop development Environment Cluster Building summary: (a) hadoop2.4.1 cluster construction (non-federated mode): hadoop2.4.1 64-bit cluster environment: HADOOP11 NameNode, SecondaryNameNodehadoop22 ResourceManagerhadoop33 DataNode, NodeManagerhadoop44 DataNode, NodeManagerhadoop55 DataNode, NodeManagerhadoop66 Dat Anode, NodeManager preparation:①> shutdown firewall ②> set static IP address ③> modify hostname ④>ip address and hostname bindings ⑤> set SSH password-free login ⑥> install JDK and configure environment variables Install hadoop2.4.11> extract 2> Modify profile-----------hadoop-env.sh------------------java_home=/usr/local/ JDK-------Core-site.xml----------------------<property > <name>Fs.defaultfs</name> <value>hdfs://hadoop11:9000</value></Property ><property > <name>Hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value></Property >-------Hdfs-site.xml----------------------<property > <name>Dfs.replication</name> <value>3</value> </Property > <property > <name>Dfs.permissions</name> <value>False</value> </Property >-------Mapred-site.xml--------------------<property > <name>Mapreduce.framework.name</name> <value>Yarn</value></Property >-------Yarn-site.xml----------------------<property > <name>Yarn.resourcemanager.hostname</name> <value>Hadoop22</value></Property > <property > <name>Yarn.nodemanager.aux-services</name> <value>Mapreduce_shuffle</value> </Property >------------Slaves------------------------hadoop33hadoop44hadoop55hadoop66------------------------------------------HDFs Namenode-format format 3> start HDFs and yarn cluster start-dfs.shstart-yarn.sh display process: 4334 SecondaryNameNode4781 Jps4614 NodeManager4188 DataNode4074 NameNode4474 ResourceManager (ii) Zookeeper cluster construction: (Quorumpeermain) Zookeeper cluster corresponding server: HADOOP33, HADOOP44, hadoop552.1 ZK server cluster size is not less than 3 nodes, requires the system time between the servers to be consistent. 2.2 Under the/usr/local directory of HADOOP33, unzip zk....tar.gz, set environment variable 2.3 in conf directory, modify file VI zoo_sample.cfg zoo.cfg 2.4 Edit the file, execute VI zoo.c FG Modified Datadir=/usr/local/zk/data New server.0=hadoop33:2888:3888 Server.1=hadoop 44:2888:3888 server.2=hadoop55:2888:3888 2.5 Create a folder Mkdir/usr/local/zk/data 2.6 in the data directory, create a file myID with a value of 0 2.7 The ZK directory is copied to HADOOP44 and hadoop55 2.8 The corresponding myID value in the HADOOP44 is changed to 1 hadoop55 the corresponding myID value to 2 2.9 start, execute the command on three nodes respectively Zkserver . SH Start 2.10 Test, execute command zkserver.sh status check on three nodes: [[email protected] local]# zkserver.sh statusjmx EnabLed by defaultusing Config:/usr/local/zk/bin/. /conf/zoo.cfgmode:follower[[email protected] data]# zkserver.sh statusjmx enabled by defaultusing Config:/usr/ local/zk/bin/. /conf/zoo.cfgmode:leader[[email protected] data]# zkserver.sh statusjmx enabled by defaultusing Config:/usr/ local/zk/bin/. /conf/zoo.cfgmode:follower (iii) HBase cluster build hbase cluster environment Description: Hmaster HADOOP11, Hadoop22 (active and standby) Hregionserver ha DOOP33, HADOOP44, hadoop55 install hbase1> unzip 2> Modify the configuration file (note the last one)----------------------- hbase-env.sh-------------------------------------Export Java_home=/usr/local/jdkexport hbase_manages_zk= False-----------------------Hbase-site.xml-----------------------------------<property > <name>Hbase.rootdir</name> <value>Hdfs://hadoop11:9000/hbase</value></Property ><property > <name>hbase.cluster.distributed</name> <value>True</value></Property ><property > <name>Hbase.zookeeper.quorum</name> <value>hadoop33:2181,hadoop44:2181,hadoop55:2181</value></Property ><property > <name>Dfs.replication</name> <value>3</value></Property >----------------------regionservers-------------------------------------hadoop33hadoop44hadoop55*************** Does not involve Hmaster-related configuration ***************----------------------- Because data in hbase is stored in HDFs----------------------hdfs-site.xml and Core-site.xml of Hadoop (HDFS) are placed under hbase/conf 3, Before starting hbasestart-hbase.sh****** to start HBase, make sure that Hadoop is working properly and can write to file ************* before you start HBase, make sure the ZK cluster is started ***************** The location of the Hmaster is not configured in the configuration file, and the result is which node starts HBase and which node is hmaster******* ****** HBase can start multiple hmaster by hbase-daemon.sh start Master, which is redundant standby status ****** To view the startup process:------------------------------------------------------------------------hmasterhregionserver---------------------- --------------------------------------------------If you are using your own ZK instance (true), the process shown after JPS is hquorumpeer if you are not using your own ZK instance (false) , the process shown after JPS is quorumpeermain using the browser access HTTP://HADOOP11 (hmaster): 60010. (iv) HIVE tool construction Use NOTE: Hive itself is a client tool, no distribution and pseudo-distribution of 1, decompression, renaming, Set environment Variables-----------------------------------------------------------------------------------------------------2, AnnInstall MySQL---until a remote connection can be made via Navicate (1) Execute command service MySQL status and rpm-qa |grep-i mysql command check if MySQL (2) executes command rpm -e xxxxxxx--nodeps Remove the installed MySQL (3) Execute command service MySQL status and Rpm-qa |grep-i MySQL check whether to remove clean (4) Execute command rpm-i mysql-server-******** (--nodeps--force) installation Server (5) Execute command mysqld_safe & start MySQL server (6) Execute command ser Vice MySQL status check whether MySQL server is started (7) Execute command rpm-i mysql-client-******** install MySQL client (8) Execute command mysql_secure_ Installation set MySQL client root login password "3 N 1 Y" (9) Execute command mysql-uroot-padmin log on to the MySQL client (10) Execute command grant all on *. * To ' root ' @ '% ' of ' identified by ' admin '; {The first * is a hive} flush privileges; Enables MySQL to connect remotely (11) and place MySQL JDBC driver into Hive's lib directory *******!!!!!!!!!!!!!!! -----------------------------------------------------------------------------------------------------3. Modify the configuration file ( 1) Modify the HADOOP configuration file hadoop-env.sh, modify the content as follows: (Hadoop2.0 did not configure this item) export hadoop_classpath=.: $CLASSPATH: $HADOOP _classpaTH: $HADOOP _home/bin (2) under directory $hive_home/bin, modify the file hive-config.sh to add the following: Export JAVA_HOME=/USR/LOCAL/JDK Expo RT hive_home=/usr/local/hive Export Hadoop_home=/usr/local/hadoop (3) under Directory $hive_home/conf/, will be hive-env.sh. Template, hive-default.xml.template, hive-log4j.properties rename modify hive_env.sh (1) 1, add hadoop_home installation directory address Modify Hive-log4j.properties (1) 1, Log4j.appender.EventCounter value modified to Org.apache.hadoop.log.metrics.EventCounter modified Configuration file Hive-site.xml: (4)/****** the host name of the machine on which the MySQL is installed, not all the host name of the Boss is written hadoop*******/<property > <name>Javax.jdo.option.ConnectionURL</name> <value>Jdbc:mysql://hadoop11:3306/hive?createdatabaseifnotexist=true</value>/Note: The path is hive/</Property > <property > <name>Javax.jdo.option.ConnectionDriverName</name> <value>Com.mysql.jdbc.Driver</value> </Property > <property > <name>Javax.jdo.option.ConnectionUserName</name> <value>Root</value> </Property > <property > <name>Javax.jdo.option.ConnectionPassword</name> <value>Admin</value> </Property > <property > <name>Hive.metastore.warehouse.dir</name> <value>/hive</value>/*** set the working directory of hive in HDFs ****/</Property >-----------------------------------------------------------------------------------------------------------4, Launch Hive tool Hive Validation: Create a table in hive and see if metadata information is available in the TBLs table in MySQL (shell+navicate) sqoop tools are built using Sqoop is just a tool, The concept of distribution and pseudo-distribution Sqoop installation (very simple): 1, Unzip 2, rename 3, configure the environment variable 4, Source/etc/profile 5, put the MySQL driver into the sqoop Lib directory ok! (vi) Construction of SQOOP tools using Flume configuration without cluster, non-cluster one said flume configuration:1> Extract 2> rename 3> modify environment variables,source/etc/profile4> Change flume-env.sh Add java_home4>**************** write the configuration file and add it to the Conf directory ************# Name the components in this agenta1.sources = R1a1.sinks = K1a1.channels = c1# describe/configure the Sourcea1.sources.r1.type = SPOOLDIRA1.SOURCES.R 1.spoolDir =/usr/local/datainputa1.sources.r1.fileheader = Truea1.sources.r1.interceptors = I1a1.sources.r1.interceptors.i1.type = timestamp# Describe the sink# Describe the Sinka1.sinks.k1.type = Hdfsa1.sinks.k1 . Hdfs.path = Hdfs://hadoop11:9000/dataoutputa1.sinks.k1.hdfs.writeformat = Texta1.sinks.k1.hdfs.fileType = DataStreama1.sinks.k1.hdfs.rollInterval = 10a1. sinks.k1.hdfs.rollSize = 0a1.sinks.k1.hdfs.rollcount = 0a1.sinks.k1.hdfs.fileprefix =%y-%m-%d-%h-%m-% Sa1.sinks.k1.hdfs.useLocalTimeStamp = true# use a channel which buffers events in Filea1.channels.c1.type = Filea1.channel S.c1.checkpointdir =/usr/flume/checkpointa1.channels.c1.datadirs =/usr/flume/data# Bind The source and sink to the Chann Ela1.sources.r1.channels = C1a1.sinks.k1.channel = c15> Execute command bin/flume-ng agent-n agent1-c conf-f conf/baby-dflum E.root.logger=debug,console can be executed.
If you have any questions, please leave a message!
Hadoop2.0 cluster, hbase cluster, zookeeper cluster, hive tool, Sqoop tool, flume tool Building Summary