Manually install cloudera cdh4.2 hadoop + hbase + hive (1)

Last Update:2018-12-04 Source: Internet

Author: User

Tags shuffle xsl

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Installation version

hadoop-2.0.0-cdh4.2.0hbase-0.94.2-cdh4.2.0hive-0.10.0-cdh4.2.0jdk1.6.0_38

Instructions before installation

The installation directory is/OPT.
Check the hosts file
Disable Firewall
Set Clock Synchronization

Instructions for use

After hadoop, hbase, and hive are successfully installed, the startup method is as follows:

Start DFS and mapreduce worker top1 for start-dfs.sh and start-yarn.sh
Start hbase worker top3 to execute start-hbase.xml
Start hive worker top1 and execute hive

Planning

    192.168.0.1             NameNode、Hive、ResourceManager    192.168.0.2             SSNameNode    192.168.0.3             DataNode、HBase、NodeManager    192.168.0.4             DataNode、HBase、NodeManager    192.168.0.6             DataNode、HBase、NodeManager    192.168.0.7             DataNode、HBase、NodeManager    192.168.0.8             DataNode、HBase、NodeManager

Deployment process system and network configuration

Modify the name of each machine

[root@desktop1 ~]# cat /etc/sysconfig/networkNETWORKING=yesHOSTNAME=desktop1

Modify/etc/hosts on each node to add the following content:

[root@desktop1 ~]# cat /etc/hosts127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4::1         localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.0.1     desktop1192.168.0.2     desktop2192.168.0.3     desktop3192.168.0.4     desktop4192.168.0.6     desktop6192.168.0.7     desktop7192.168.0.8     desktop8

Configure SSH login without a password the following is to set up mongotop1 to log on to other machines without a password.

    [root@desktop1 ~]# ssh-keygen    [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop2    [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop3    [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop4    [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop6    [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop7    [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop8

Disable the firewall on each machine:

    [root@desktop1 ~]# service iptables stop

Install hadoop and configure hadoop

Upload jdk1.6.0_38.zip to/OPT and decompress it. Upload hadoop-2.0.0-cdh4.2.0.zip to/OPT and decompress it.

Configure the following files on namenode:

Core-site.xml fs. defaultfs specifies the namenode file system to enable the recycle bin function. Hdfs-site.xml DFS. namenode. name. dir specifies the directory where namenode stores Meta and editlog, DFS. datanode. data. dir specifies the directory where datanode stores blocks, DFS. namenode. secondary. HTTP-address specifies the secondary namenode address. Enable webhdfs. Server Load balancer adds a datanode node host

Core-site.xml this file specifies that fs. defaultfs is connected to desktop1, that is, the namenode node.

[Root @ mongotop1 hadoop] # PWD/opt/hadoop-2.0.0-cdh4.2.0/etc/hadoop [root @ mongotop1 hadoop] # Cat core-site.xml <? XML version = "1.0" encoding = "UTF-8"?> <? XML-stylesheet type = "text/XSL" href = "configuration. XSL"?> <Configuration> <! -- Fs. Default. Name For mrv1, FS. defaultfs for mrv2 (yarn) --> <property> <Name> Fs. defaultfs </Name> <! -- The value of this place should be the DFS in the hdfs-site.xml file. federation. nameservices consistency --> <value> HDFS: // define top1 </value> </property> <Name> FS. trash. interval </Name> <value> 10080 </value> </property> <Name> FS. trash. checkpoint. interval </Name> <value> 10080 </value> </property> </configuration>

Hdfs-site.xml this file mainly sets the number of data copies to save, as well as namenode, datanode data save path and HTTP-address.

[root@desktop1 hadoop]# cat hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property>  <name>dfs.replication</name>  <value>1</value></property><property>  <name>hadoop.tmp.dir</name>  <value>/opt/data/hadoop-${user.name}</value></property><property><name>dfs.namenode.http-address</name><value>desktop1:50070</value></property><property><name>dfs.namenode.secondary.http-address</name><value>desktop2:50090</value></property><property><name>dfs.webhdfs.enabled</name><value>true</value></property></configuration>

Set namenode and secondary namenode nodes for masters.

[root@desktop1 hadoop]# cat masters desktop1desktop2

Slaves sets the machines on which the datanode node is installed.

[root@desktop1 hadoop]# cat slaves desktop3desktop4desktop6desktop7desktop8

Configure mapreduce

The mapred-site.xml configuration uses the yarn computing framework and the address of jobhistory.

[root@desktop1 hadoop]# cat mapred-site.xml<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property> <name>mapreduce.framework.name</name> <value>yarn</value></property><property> <name>mapreduce.jobhistory.address</name> <value>desktop1:10020</value></property><property> <name>mapreduce.jobhistory.webapp.address</name> <value>desktop1:19888</value></property></configuration>

Yarn-site.xml mainly configures ResourceManager address andyarn.application.classpath(This path is very important. Otherwise, the system will prompt that the class cannot be found during hive integration)

[root@desktop1 hadoop]# cat yarn-site.xml <?xml version="1.0"?><configuration><property>    <name>yarn.resourcemanager.resource-tracker.address</name>    <value>desktop1:8031</value>  </property>  <property>    <name>yarn.resourcemanager.address</name>    <value>desktop1:8032</value>  </property>  <property>    <name>yarn.resourcemanager.scheduler.address</name>    <value>desktop1:8030</value>  </property>  <property>    <name>yarn.resourcemanager.admin.address</name>    <value>desktop1:8033</value>  </property>  <property>    <name>yarn.resourcemanager.webapp.address</name>    <value>desktop1:8088</value>  </property>  <property>    <description>Classpath for typical applications.</description>    <name>yarn.application.classpath</name>    <value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,    $HADOOP_COMMON_HOME/share/hadoop/common/lib/*,    $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,    $YARN_HOME/share/hadoop/yarn/*,$YARN_HOME/share/hadoop/yarn/lib/*,    $YARN_HOME/share/hadoop/mapreduce/*,$YARN_HOME/share/hadoop/mapreduce/lib/*</value>  </property>  <property>    <name>yarn.nodemanager.aux-services</name>    <value>mapreduce.shuffle</value>  </property>  <property>    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>    <value>org.apache.hadoop.mapred.ShuffleHandler</value>  </property>  <property>    <name>yarn.nodemanager.local-dirs</name>    <value>/opt/data/yarn/local</value>  </property>  <property>    <name>yarn.nodemanager.log-dirs</name>    <value>/opt/data/yarn/logs</value>  </property>  <property>    <description>Where to aggregate logs</description>    <name>yarn.nodemanager.remote-app-log-dir</name>    <value>/opt/data/yarn/logs</value>  </property>  <property>    <name>yarn.app.mapreduce.am.staging-dir</name>    <value>/user</value> </property></configuration>

Synchronize configuration files

Modify the. bashrc environment variable, synchronize it to several other machines, and source. bashrc

[root@desktop1 ~]# cat .bashrc # .bashrcalias rm='rm -i'alias cp='cp -i'alias mv='mv -i'# Source global definitionsif [ -f /etc/bashrc ]; then        . /etc/bashrcfi# User specific environment and startup programsexport LANG=zh_CN.utf8export JAVA_HOME=/opt/jdk1.6.0_38export JRE_HOME=$JAVA_HOME/jreexport CLASSPATH=./:$JAVA_HOME/lib:$JRE_HOME/lib:$JRE_HOME/lib/tools.jarexport HADOOP_HOME=/opt/hadoop-2.0.0-cdh4.2.0export HIVE_HOME=/opt/hive-0.10.0-cdh4.2.0export HBASE_HOME=/opt/hbase-0.94.2-cdh4.2.0export HADOOP_MAPRED_HOME=${HADOOP_HOME}export HADOOP_COMMON_HOME=${HADOOP_HOME}export HADOOP_HDFS_HOME=${HADOOP_HOME}export YARN_HOME=${HADOOP_HOME}export HADOOP_YARN_HOME=${HADOOP_HOME}export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoopexport HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoopexport YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoopexport PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin

Modify the configuration file to make it take effect.

[root@desktop1 ~]# source .bashrc

Copy the/opt/hadoop-2.0.0-cdh4.2.0 on strongtop1 to another machine

Start script

To start hadoop for the first time, you must format namenode. This operation is only performed once. When you modify the configuration file, you must format it again.

[root@desktop1 hadoop]hadoop namenode -format

Start HDFS on strongtop1:

[root@desktop1 hadoop]#start-dfs.sh

Start mapreduce on worker top1:

[root@desktop1 hadoop]#start-yarn.sh

Start historyserver on strongtop1:

[root@desktop1 hadoop]#mr-jobhistory-daemon.sh start historyserver

View mapreduce:

http://desktop1:8088/cluster

View nodes:

http://desktop2:8042/http://desktop2:8042/node

Check cluster process

[root@desktop1 ~]# jps5389 NameNode5980 Jps5710 ResourceManager7032 JobHistoryServer[root@desktop2 ~]# jps3187 Jps3124 SecondaryNameNode[root@desktop3 ~]# jps3187 Jps3124 DataNode5711 NodeManager

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More