Installation version
hadoop-2.0.0-cdh4.2.0hbase-0.94.2-cdh4.2.0hive-0.10.0-cdh4.2.0jdk1.6.0_38
Instructions before installation
- The installation directory is/OPT.
- Check the hosts file
- Disable Firewall
- Set Clock Synchronization
Instructions for use
After hadoop, hbase, and hive are successfully installed, the startup method is as follows:
- Start DFS and mapreduce worker top1 for start-dfs.sh and start-yarn.sh
- Start hbase worker top3 to execute start-hbase.xml
- Start hive worker top1 and execute hive
Planning
192.168.0.1 NameNode、Hive、ResourceManager 192.168.0.2 SSNameNode 192.168.0.3 DataNode、HBase、NodeManager 192.168.0.4 DataNode、HBase、NodeManager 192.168.0.6 DataNode、HBase、NodeManager 192.168.0.7 DataNode、HBase、NodeManager 192.168.0.8 DataNode、HBase、NodeManager
Deployment process system and network configuration
Modify the name of each machine
[root@desktop1 ~]# cat /etc/sysconfig/networkNETWORKING=yesHOSTNAME=desktop1
Modify/etc/hosts on each node to add the following content:
[root@desktop1 ~]# cat /etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.0.1 desktop1192.168.0.2 desktop2192.168.0.3 desktop3192.168.0.4 desktop4192.168.0.6 desktop6192.168.0.7 desktop7192.168.0.8 desktop8
Configure SSH login without a password the following is to set up mongotop1 to log on to other machines without a password.
[root@desktop1 ~]# ssh-keygen [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop2 [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop3 [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop4 [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop6 [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop7 [root@desktop1 ~]# ssh-copy-id -i .ssh/id_rsa.pub desktop8
- Disable the firewall on each machine:
[root@desktop1 ~]# service iptables stop
Install hadoop and configure hadoop
Upload jdk1.6.0_38.zip to/OPT and decompress it. Upload hadoop-2.0.0-cdh4.2.0.zip to/OPT and decompress it.
Configure the following files on namenode:
Core-site.xml fs. defaultfs specifies the namenode file system to enable the recycle bin function. Hdfs-site.xml DFS. namenode. name. dir specifies the directory where namenode stores Meta and editlog, DFS. datanode. data. dir specifies the directory where datanode stores blocks, DFS. namenode. secondary. HTTP-address specifies the secondary namenode address. Enable webhdfs. Server Load balancer adds a datanode node host
- Core-site.xml this file specifies that fs. defaultfs is connected to desktop1, that is, the namenode node.
[Root @ mongotop1 hadoop] # PWD/opt/hadoop-2.0.0-cdh4.2.0/etc/hadoop [root @ mongotop1 hadoop] # Cat core-site.xml <? XML version = "1.0" encoding = "UTF-8"?> <? XML-stylesheet type = "text/XSL" href = "configuration. XSL"?> <Configuration> <! -- Fs. Default. Name For mrv1, FS. defaultfs for mrv2 (yarn) --> <property> <Name> Fs. defaultfs </Name> <! -- The value of this place should be the DFS in the hdfs-site.xml file. federation. nameservices consistency --> <value> HDFS: // define top1 </value> </property> <Name> FS. trash. interval </Name> <value> 10080 </value> </property> <Name> FS. trash. checkpoint. interval </Name> <value> 10080 </value> </property> </configuration>
- Hdfs-site.xml this file mainly sets the number of data copies to save, as well as namenode, datanode data save path and HTTP-address.
[root@desktop1 hadoop]# cat hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property> <name>dfs.replication</name> <value>1</value></property><property> <name>hadoop.tmp.dir</name> <value>/opt/data/hadoop-${user.name}</value></property><property><name>dfs.namenode.http-address</name><value>desktop1:50070</value></property><property><name>dfs.namenode.secondary.http-address</name><value>desktop2:50090</value></property><property><name>dfs.webhdfs.enabled</name><value>true</value></property></configuration>
- Set namenode and secondary namenode nodes for masters.
[root@desktop1 hadoop]# cat masters desktop1desktop2
- Slaves sets the machines on which the datanode node is installed.
[root@desktop1 hadoop]# cat slaves desktop3desktop4desktop6desktop7desktop8
Configure mapreduce
- The mapred-site.xml configuration uses the yarn computing framework and the address of jobhistory.
[root@desktop1 hadoop]# cat mapred-site.xml<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property> <name>mapreduce.framework.name</name> <value>yarn</value></property><property> <name>mapreduce.jobhistory.address</name> <value>desktop1:10020</value></property><property> <name>mapreduce.jobhistory.webapp.address</name> <value>desktop1:19888</value></property></configuration>
- Yarn-site.xml mainly configures ResourceManager address and
yarn.application.classpath
(This path is very important. Otherwise, the system will prompt that the class cannot be found during hive integration)
[root@desktop1 hadoop]# cat yarn-site.xml <?xml version="1.0"?><configuration><property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>desktop1:8031</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>desktop1:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>desktop1:8030</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>desktop1:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>desktop1:8088</value> </property> <property> <description>Classpath for typical applications.</description> <name>yarn.application.classpath</name> <value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*, $HADOOP_COMMON_HOME/share/hadoop/common/lib/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, $YARN_HOME/share/hadoop/yarn/*,$YARN_HOME/share/hadoop/yarn/lib/*, $YARN_HOME/share/hadoop/mapreduce/*,$YARN_HOME/share/hadoop/mapreduce/lib/*</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce.shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/opt/data/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/opt/data/yarn/logs</value> </property> <property> <description>Where to aggregate logs</description> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/opt/data/yarn/logs</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/user</value> </property></configuration>
Synchronize configuration files
Modify the. bashrc environment variable, synchronize it to several other machines, and source. bashrc
[root@desktop1 ~]# cat .bashrc # .bashrcalias rm='rm -i'alias cp='cp -i'alias mv='mv -i'# Source global definitionsif [ -f /etc/bashrc ]; then . /etc/bashrcfi# User specific environment and startup programsexport LANG=zh_CN.utf8export JAVA_HOME=/opt/jdk1.6.0_38export JRE_HOME=$JAVA_HOME/jreexport CLASSPATH=./:$JAVA_HOME/lib:$JRE_HOME/lib:$JRE_HOME/lib/tools.jarexport HADOOP_HOME=/opt/hadoop-2.0.0-cdh4.2.0export HIVE_HOME=/opt/hive-0.10.0-cdh4.2.0export HBASE_HOME=/opt/hbase-0.94.2-cdh4.2.0export HADOOP_MAPRED_HOME=${HADOOP_HOME}export HADOOP_COMMON_HOME=${HADOOP_HOME}export HADOOP_HDFS_HOME=${HADOOP_HOME}export YARN_HOME=${HADOOP_HOME}export HADOOP_YARN_HOME=${HADOOP_HOME}export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoopexport HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoopexport YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoopexport PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HIVE_HOME/bin
Modify the configuration file to make it take effect.
[root@desktop1 ~]# source .bashrc
Copy the/opt/hadoop-2.0.0-cdh4.2.0 on strongtop1 to another machine
Start script
To start hadoop for the first time, you must format namenode. This operation is only performed once. When you modify the configuration file, you must format it again.
[root@desktop1 hadoop]hadoop namenode -format
Start HDFS on strongtop1:
[root@desktop1 hadoop]#start-dfs.sh
Start mapreduce on worker top1:
[root@desktop1 hadoop]#start-yarn.sh
Start historyserver on strongtop1:
[root@desktop1 hadoop]#mr-jobhistory-daemon.sh start historyserver
View mapreduce:
http://desktop1:8088/cluster
View nodes:
http://desktop2:8042/http://desktop2:8042/node
Check cluster process
[root@desktop1 ~]# jps5389 NameNode5980 Jps5710 ResourceManager7032 JobHistoryServer[root@desktop2 ~]# jps3187 Jps3124 SecondaryNameNode[root@desktop3 ~]# jps3187 Jps3124 DataNode5711 NodeManager