Manually install cloudera cdh4.2 hadoop + hbase + hive (1)

Source: Internet
Author: User
Tags shuffle xsl
Installation version
hadoop-2.0.0-cdh4.2.0hbase-0.94.2-cdh4.2.0hive-0.10.0-cdh4.2.0jdk1.6.0_38
Instructions before installation
  • The installation directory is/OPT.
  • Check the hosts file
  • Disable Firewall
  • Set Clock Synchronization
Instructions for use

After hadoop, hbase, and hive are successfully installed, the startup method is as follows:

  • Start DFS and mapreduce worker top1 for start-dfs.sh and start-yarn.sh
  • Start hbase worker top3 to execute start-hbase.xml
  • Start hive worker top1 and execute hive
Planning
    192.168.0.1             NameNode、Hive、ResourceManager    192.168.0.2             SSNameNode    192.168.0.3             DataNode、HBase、NodeManager    192.168.0.4             DataNode、HBase、NodeManager    192.168.0.6             DataNode、HBase、NodeManager    192.168.0.7             DataNode、HBase、NodeManager    192.168.0.8             DataNode、HBase、NodeManager
Deployment process system and network configuration
  1. Modify the name of each machine

    [root@desktop1 ~]# cat /etc/sysconfig/networkNETWORKING=yesHOSTNAME=desktop1

  2. Modify/etc/hosts on each node to add the following content:

    [root@desktop1 ~]# cat /etc/hosts127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4::1         localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.0.1     desktop1192.168.0.2     desktop2192.168.0.3     desktop3192.168.0.4     desktop4192.168.0.6     desktop6192.168.0.7     desktop7192.168.0.8     desktop8

  3. Configure SSH login without a password the following is to set up mongotop1 to log on to other machines without a password.

    [root@desktop1 ~]# ssh-keygen    [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop2    [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop3    [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop4    [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop6    [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop7    [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop8
  1. Disable the firewall on each machine:
    [root@desktop1 ~]# service iptables stop
Install hadoop and configure hadoop

Upload jdk1.6.0_38.zip to/OPT and decompress it. Upload hadoop-2.0.0-cdh4.2.0.zip to/OPT and decompress it.

Configure the following files on namenode:

Core-site.xml fs. defaultfs specifies the namenode file system to enable the recycle bin function. Hdfs-site.xml DFS. namenode. name. dir specifies the directory where namenode stores Meta and editlog, DFS. datanode. data. dir specifies the directory where datanode stores blocks, DFS. namenode. secondary. HTTP-address specifies the secondary namenode address. Enable webhdfs. Server Load balancer adds a datanode node host
  1. Core-site.xml this file specifies that fs. defaultfs is connected to desktop1, that is, the namenode node.
[Root @ mongotop1 hadoop] # PWD/opt/hadoop-2.0.0-cdh4.2.0/etc/hadoop [root @ mongotop1 hadoop] # Cat core-site.xml <? XML version = "1.0" encoding = "UTF-8"?> <? XML-stylesheet type = "text/XSL" href = "configuration. XSL"?> <Configuration> <! -- Fs. Default. Name For mrv1, FS. defaultfs for mrv2 (yarn) --> <property> <Name> Fs. defaultfs </Name> <! -- The value of this place should be the DFS in the hdfs-site.xml file. federation. nameservices consistency --> <value> HDFS: // define top1 </value> </property> <Name> FS. trash. interval </Name> <value> 10080 </value> </property> <Name> FS. trash. checkpoint. interval </Name> <value> 10080 </value> </property> </configuration>
  1. Hdfs-site.xml this file mainly sets the number of data copies to save, as well as namenode, datanode data save path and HTTP-address.
[root@desktop1 hadoop]# cat hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property>  <name>dfs.replication</name>  <value>1</value></property><property>  <name>hadoop.tmp.dir</name>  <value>/opt/data/hadoop-${user.name}</value></property><property><name>dfs.namenode.http-address</name><value>desktop1:50070</value></property><property><name>dfs.namenode.secondary.http-address</name><value>desktop2:50090</value></property><property><name>dfs.webhdfs.enabled</name><value>true</value></property></configuration>
  1. Set namenode and secondary namenode nodes for masters.
[root@desktop1 hadoop]# cat masters desktop1desktop2
  1. Slaves sets the machines on which the datanode node is installed.
[root@desktop1 hadoop]# cat slaves desktop3desktop4desktop6desktop7desktop8
Configure mapreduce
  1. The mapred-site.xml configuration uses the yarn computing framework and the address of jobhistory.
[root@desktop1 hadoop]# cat mapred-site.xml<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property> <name>mapreduce.framework.name</name> <value>yarn</value></property><property> <name>mapreduce.jobhistory.address</name> <value>desktop1:10020</value></property><property> <name>mapreduce.jobhistory.webapp.address</name> <value>desktop1:19888</value></property></configuration>
  1. Yarn-site.xml mainly configures ResourceManager address andyarn.application.classpath(This path is very important. Otherwise, the system will prompt that the class cannot be found during hive integration)
[root@desktop1 hadoop]# cat yarn-site.xml <?xml version="1.0"?><configuration><property>    <name>yarn.resourcemanager.resource-tracker.address</name>    <value>desktop1:8031</value>  </property>  <property>    <name>yarn.resourcemanager.address</name>    <value>desktop1:8032</value>  </property>  <property>    <name>yarn.resourcemanager.scheduler.address</name>    <value>desktop1:8030</value>  </property>  <property>    <name>yarn.resourcemanager.admin.address</name>    <value>desktop1:8033</value>  </property>  <property>    <name>yarn.resourcemanager.webapp.address</name>    <value>desktop1:8088</value>  </property>  <property>    <description>Classpath for typical applications.</description>    <name>yarn.application.classpath</name>    <value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,    $HADOOP_COMMON_HOME/share/hadoop/common/lib/*,    $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,    $YARN_HOME/share/hadoop/yarn/*,$YARN_HOME/share/hadoop/yarn/lib/*,    $YARN_HOME/share/hadoop/mapreduce/*,$YARN_HOME/share/hadoop/mapreduce/lib/*</value>  </property>  <property>    <name>yarn.nodemanager.aux-services</name>    <value>mapreduce.shuffle</value>  </property>  <property>    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>    <value>org.apache.hadoop.mapred.ShuffleHandler</value>  </property>  <property>    <name>yarn.nodemanager.local-dirs</name>    <value>/opt/data/yarn/local</value>  </property>  <property>    <name>yarn.nodemanager.log-dirs</name>    <value>/opt/data/yarn/logs</value>  </property>  <property>    <description>Where to aggregate logs</description>    <name>yarn.nodemanager.remote-app-log-dir</name>    <value>/opt/data/yarn/logs</value>  </property>  <property>    <name>yarn.app.mapreduce.am.staging-dir</name>    <value>/user</value> </property></configuration>
Synchronize configuration files

Modify the. bashrc environment variable, synchronize it to several other machines, and source. bashrc

[root@desktop1 ~]# cat .bashrc # .bashrcalias rm='rm -i'alias cp='cp -i'alias mv='mv -i'# Source global definitionsif [ -f /etc/bashrc ]; then        . /etc/bashrcfi# User specific environment and startup programsexport LANG=zh_CN.utf8export JAVA_HOME=/opt/jdk1.6.0_38export JRE_HOME=$JAVA_HOME/jreexport CLASSPATH=./:$JAVA_HOME/lib:$JRE_HOME/lib:$JRE_HOME/lib/tools.jarexport HADOOP_HOME=/opt/hadoop-2.0.0-cdh4.2.0export HIVE_HOME=/opt/hive-0.10.0-cdh4.2.0export HBASE_HOME=/opt/hbase-0.94.2-cdh4.2.0export HADOOP_MAPRED_HOME=${HADOOP_HOME}export HADOOP_COMMON_HOME=${HADOOP_HOME}export HADOOP_HDFS_HOME=${HADOOP_HOME}export YARN_HOME=${HADOOP_HOME}export HADOOP_YARN_HOME=${HADOOP_HOME}export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoopexport HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoopexport YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoopexport PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin

Modify the configuration file to make it take effect.

[root@desktop1 ~]# source .bashrc 

Copy the/opt/hadoop-2.0.0-cdh4.2.0 on strongtop1 to another machine

Start script

To start hadoop for the first time, you must format namenode. This operation is only performed once. When you modify the configuration file, you must format it again.

[root@desktop1 hadoop]hadoop namenode -format

Start HDFS on strongtop1:

[root@desktop1 hadoop]#start-dfs.sh

Start mapreduce on worker top1:

[root@desktop1 hadoop]#start-yarn.sh

Start historyserver on strongtop1:

[root@desktop1 hadoop]#mr-jobhistory-daemon.sh start historyserver

View mapreduce:

http://desktop1:8088/cluster

View nodes:

http://desktop2:8042/http://desktop2:8042/node
Check cluster process
[root@desktop1 ~]# jps5389 NameNode5980 Jps5710 ResourceManager7032 JobHistoryServer[root@desktop2 ~]# jps3187 Jps3124 SecondaryNameNode[root@desktop3 ~]# jps3187 Jps3124 DataNode5711 NodeManager

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.