Hadoop cluster Installation--ubuntu

Source: Internet
Author: User
Tags hadoop mapreduce hadoop wiki

My home treasure recently in self-study Hadoop, and then play together, here for her to organize a basic building blog, I hope she can help. Again, before you begin, let's look at what Hadoop is.

Hadoop is a distributed system infrastructure developed by the Apache Foundation. It is based on a Google-published paper on MapReduce and Google file systems. The Hadoop framework transparently provides reliability and data movement for your applications. It implements a programming paradigm called MapReduce : Applications are partitioned into a number of small parts, each of which can be run or rerun on any node in the cluster.

Hadoop implements a distributed filesystem (Hadoop Distributed File System), referred to as HDFs. HDFs has a high level of fault tolerance and is designed to be deployed on inexpensive (low-cost) hardware, and it provides high throughput (hi throughput) to access application data for applications with very large datasets (large data set). HDFs relaxes the requirements of (relax) POSIX and can access data in a stream (streaming access) file system.

Users can develop distributed programs without knowing the underlying details of the distribution. Take advantage of the power of the cluster to perform high-speed operations and storage. The core design of the Hadoop framework is HDFS and MapReduce. HDFS provides storage for massive amounts of data, and MapReduce provides calculations for massive amounts of data.

Build

To build a cluster, you need a minimum of two machines to build a multi-node Hadoop environment, where I use the latest stable 2.7.3 version and three cloud hosts (1 primary two from, Ubuntu1404 LTS).

    1. Modify the Hosts file

To ensure that the network of three machines is up to the premise, change the hostname and modify the Hosts file:

# Hostnamectl Set-hostname master//execute on Master node # Hostnamectl Set-hostname slave-1//Execute on slave-1 node # hostnamectl SE T-hostname slave-2//The Hosts file of the three machines is modified on the slave-2 node respectively: # vim/etc/hosts192.168.1.2 master192.168.1.3 slave-1192.168.1 .4 Slave-2

2. Install Java on the master and slave nodes:

# add-apt-repository Ppa:webupd8team/java//Add ppa# apt-get update# apt-get Installoracle-java8-installer # java-version  Verify Java Version Java edition "1.8.0_121" Java (tm) SE Runtime Environment (build 1.8.0_121-b13) Java HotSpot (tm) 64-bit Server VM (Build 25.121-b13, Mixed mode)

3. Disable IPV6

Hadoop is currently not very good for IPv6, and some unknown bugs can be created on some Linux distributions. On the Hadoop wiki provides a way to disable, I modify the sysctl.conf file here, add the following lines:

# Vim/etc/sysctl.confnet.ipv6.conf.all.disable_ipv6 = 1net.ipv6.conf.default.disable_ipv6 = 1net.ipv6.conf.lo.disable_ipv6 = # Sysctl-p//Make it effective immediately

4. Create a Hadoop User

Execute on the master and slave nodes:

# addgroup hdgroup  //Creating hadoop group# adduser -ingroup hdgroup  hduser  //Create Hadoop user and join hadoop groupadding user  ' HDUser '   adding new user  ' HDUser '   (1001)  with group  ' Hdgroup '   creating home directory  '/home/hduser '   copying files from  '/etc/skel '   enter new unix password:             //Enter the password and then return to retype new unix password:passwd: password updated  successfullychanging the user information for hduserenter the new  value, or press enter for the default        full name []:       room number []:        work phone []: &Nbsp;     home phone []:       other  []:IS THE INFORMATION CORRECT? [Y/N]

  Hadoop requires no password to log in, so you need to generate a secret key, and note that you will do the following on master and slave, respectively, using the normal hduser user you just created:

# su - hduser$ ssh-keygen -n  ' Generating public/private rsa key  pair. enter file in which to save the key  (/HOME/HDUSER/.SSH/ID_RSA): Created  directory  '/home/hduser/.ssh '. Your identification has been saved in /home/hduser/.ssh/id_rsa. Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.the key  fingerprint is:5b:ae:c6:5a:ce:66:51:d3:6c:6c:14:9b:b2:8a:da:e9 [email protected]The  Key ' s randomart image is:+--[ rsa 2048]----+|            &NBSP, ....    | |             .o   | |           .=o    | |           oo*    | |         s.o+     | |       &NBSP, .... =       | |      &NBSP, .... +..       | |      o ==.       | |    &NBSP, .... e=+        |+-----------------+$ ssh-copy-id [email  Protected]$ ssh-copy-id [email protected]$ ssh-copy-id [email protected]

5. Download and install Hadoop

Login to the official Hadoop, select the version you need, copy the download link, here I use the latest 2.7.3 version:

650) this.width=650; "Src=" https://s3.51cto.com/wyfs02/M02/8D/81/wKiom1iezuazT-lkAAL2XegG4Yk005.png-wh_500x0-wm_ 3-wmp_4-s_1622907005.png "title=" Snip20170211_48.png "alt=" Wkiom1iezuazt-lkaal2xegg4yk005.png-wh_50 "/>

After opening the link, right-click to copy the link address:

650) this.width=650; "Src=" https://s2.51cto.com/wyfs02/M00/8D/7F/wKioL1iezzOBIOACAAI4HvzbjQU970.png-wh_500x0-wm_ 3-wmp_4-s_1431648937.png "title=" Snip20170211_49.png "alt=" Wkiol1iezzobioacaai4hvzbjqu970.png-wh_50 "/>

Execute separately on master and slave (you can also copy to two other tables after downloading on one machine):

$ cd/home/hduser$ wget-c $ tar-zxvf hadoop-2.7.3.tar.gz$ mv hadoop-2.7.3 Hadoop

6. Changing Environment variables

First determine which Java home directory was previously installed, and find the following method (performed on any machine):

[Email protected]:~$ env | Grep-i javajava_home=/usr/lib/jvm/java-8-oracle

Do the following on the master and Slave nodes, and edit the ". BASHRC" file, adding the following lines:

$ vim. BASHRC//Edit the file, add the following lines to export Hadoop_home=/home/hduser/hadoopexport java_home=/usr/lib/jvm/java-8-oraclepath=$ PATH: $HADOOP _home/bin: $HADOOP _home/sbin$ source. BASHRC//source make it effective immediately

Change the java_home of hadoop-env by doing the following on the master and Slave nodes, respectively:

$ vim/home/hduser/hadoop/etc/hadoop/hadoop-env.sh#export Java_home=${java_home}//change this line, or comment out a new line to the export JAVA_HOME =/usr/lib/jvm/java-8-oracle

7.Hadoop Configuration

The configuration for Hadoop mainly involves four configuration files: etc/hadoop/core-site.xml, etc/hadoop/hdfs-site.xml, etc/hadoop/ Yarn-site.xml and etc/hadoop/mapred-site.xml.

Excerpt from a paragraph on the network here, be sure to read this section before proceeding to the following steps to better understand:

  • Hadoop Distributed File system: A Distributed File system that provides high-throughput access to application dat A. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and datanodes that store the ACTU Al data. If you compare HDFS to a traditional storage structures (e.g. FAT, NTFS) and then NameNode are analogous to a Directory Node Structure, and DataNode is analogous to actual file storage blocks.

  • Hadoop YARN: A framework for job scheduling and cluster resource management.

  • Hadoop MapReduce: A yarn-based system for parallel processing of large data sets.

① change the "core-site.xml" file on the master and slave nodes, the master and slave nodes should use the same "Fs.defaultfs" value and must point to the master node; Configuration, add the following configurations in the middle:

<property> <name>hadoop.tmp.dir</name> <value>/home/hduser/tmp</value> < Description>temporary directory.</description></property><property> <name> Fs.defaultfs</name> <value>hdfs://master:54310</value> <description>use HDFs as file storage Engine</description></property>

The final core-site.xml configuration file looks like this:

650) this.width=650; "Src=" https://s3.51cto.com/wyfs02/M00/8D/7F/wKioL1ie2-2SHfPcAAL8qN5RLB4952.png-wh_500x0-wm_ 3-wmp_4-s_2870729140.png "title=" Snip20170211_51.png "alt=" Wkiol1ie2-2shfpcaal8qn5rlb4952.png-wh_50 "/>

If the TMP directory does not exist, you need to create one manually:

$ mkdir/home/hduser/tmp$ chown-r hduser:hdgroup/home/hduser/tmp//non-HDUser user Create virtual assignment

② only change the "mapred-site.xml" file on the master node, because there is no such file, you need to copy the template file to generate one:

$ cd/home/hduser/hadoop/$ Cp-av etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml

Edit the XML configuration file and add the following configuration in the middle of "configuration":

<property> <name>mapreduce.jobtracker.address</name> <value>master:54311</value> <description>the host and port that is the MapReduce job tracker runs at. If "Local", then jobs is run in-process as a single map and reduce Task.</description></property><propert y> <name>mapreduce.framework.name</name> <value>yarn</value> <description>the Framework for running MapReduce jobs</description></property>

③ change the "hdfs-site.xml" file on the master and slave nodes and add the following configuration in the middle of "configuration":

<property> <name>dfs.replication</name> <value>2</value> < description>default block replication.  the actual number of  Replications can be specified when the file is created.  the  default is used if replication is not specified in create  time. </description></property><property> <name>dfs.namenode.name.dir </name> <value>/data/hduser/hdfs/namenode</value> <description>determines  where on the local filesystem the dfs name node should  store the name table (fsimage).  if this is a comma-delimited  list of directories then the name table is replicated in  all of the directoRies, for redundancy. </description></property><property> <name> dfs.datanode.data.dir</name> <value>/data/hduser/hdfs/datanode</value> < Description>determines where on the local filesystem an dfs data  node should store its blocks. if this is a comma-delimited  list of directories, then data will be stored in all  Named directories, typically on different devices. directories that do  not exist are ignored. </description>

and create the directory specified in the configuration file just now:

$ mkdir-p/home/hduser/data/hduser/hdfs/{namenode,datanode}$ chown-r hduser:hdgroup/home/hduser/data/// If a non-hduser user creates a need to empower

1). Here the default value of Dfs.replication is 3, here I set 2 copies, representing each file stored in HDFs has an additional copy, where the value can be determined by the size of the cluster.

2) Dfs.namenode.name.dir and Dfs.datanode.name.dir are the locations where Namenode and Datanode store HDFS data block files if they are not required to be created manually .

④ change the "yarn-site.xml" file on the master and slave nodes, the master and slave nodes should use the same value and point to the master node. In the middle of "configuration", add the following configuration:

<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value ></property><property> <name>yarn.resourcemanager.scheduler.address</name> <value >master:8030</value></property><property> <name>yarn.resourcemanager.address</ Name> <value>master:8032</value></property><property> <name> Yarn.resourcemanager.webapp.address</name> <value>master:8088</value></property>< Property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031< /value></property><property> <name>yarn.resourcemanager.admin.address</name> < Value>master:8033</value></property>

⑤ Update slave file

Modify the slave file on the master node, add the host name or IP address of the master and slave nodes, and remove "localhost":

$ vim/home/hduser/hadoop/etc/hadoop/slavesmasterslave-1slave-2

⑥ format Namenode:

Before starting cluster, you need to format the Namenode and execute it on master:

$ HDFs Namenode-format

See similar tip info: "Storage Directory/home/hduser/data/hduser/hdfs/namenode has been successfully formatted." Indicates that the format was successful.

⑦ Start Service

You can start all services directly using the script "start-all.sh" provided by Hadoop, or you can start the DFS and yarn separately. You can use the absolute path:/home/hduser/hadoop/sbin/start-all.sh, or you can call the start-all.sh script directly (because the path has been changed earlier):

$ start-all.sh

If you do not see any error messages as shown, the cluster has started successfully:

650) this.width=650; "Src=" https://s5.51cto.com/wyfs02/M01/8D/80/wKioL1ifAi6CHKv1AALFSxwfsBE953.png-wh_500x0-wm_ 3-wmp_4-s_646365465.png "title=" Snip20170211_58.png "alt=" Wkiol1ifai6chkv1aalfsxwfsbe953.png-wh_50 "/>

⑧ Verify View

Use the JPS command to view the started service on master and slave, respectively

650) this.width=650; "Src=" https://s4.51cto.com/wyfs02/M02/8D/80/wKioL1ifAuywLnUvAACUdLK8-wk339.png-wh_500x0-wm_ 3-wmp_4-s_1068650777.png "title=" Snip20170211_60.png "alt=" Wkiol1ifauywlnuvaacudlk8-wk339.png-wh_50 "/>

650) this.width=650; "Src=" https://s1.51cto.com/wyfs02/M00/8D/80/wKioL1ifAwuBfAE7AABg8r7Ndd4766.png-wh_500x0-wm_ 3-wmp_4-s_556833938.png "title=" Snip20170211_61.png "alt=" Wkiol1ifawubfae7aabg8r7ndd4766.png-wh_50 "/>

Web Validation:

Browser Open: http://master:50070

650) this.width=650; "Src=" https://s1.51cto.com/wyfs02/M02/8D/80/wKioL1ifA_OxKaJlAAK5gJWhOxA586.png-wh_500x0-wm_ 3-wmp_4-s_2956240976.png "title=" Snip20170211_63.png "alt=" Wkiol1ifa_oxkajlaak5gjwhoxa586.png-wh_50 "/>

View Yarn Web Console:http://master:8088/cluster/nodes

If all node starts normally, it will all be displayed:

650) this.width=650; "Src=" https://s5.51cto.com/wyfs02/M02/8D/81/wKioL1ifBLewUbRbAAMYZLw9g7U228.png-wh_500x0-wm_ 3-wmp_4-s_552914878.png "title=" Snip20170211_66.png "alt=" Wkiol1ifblewubrbaamyzlw9g7u228.png-wh_50 "/>

The Hadoop unpacked share directory provides us with a few example jar packages, and we perform a look at the effect:

$ hadoop jar/home/hduser/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar Pi 30 100

Accessed via browser after execution: Http://master:8088/cluster/apps

To see the currently executing tasks:

650) this.width=650; "Src=" https://s5.51cto.com/wyfs02/M02/8D/83/wKiom1ifBniTbJrxAAKjzYi8iq4349.png-wh_500x0-wm_ 3-wmp_4-s_3750754622.png "title=" Snip20170211_67.png "alt=" Wkiom1ifbnitbjrxaakjzyi8iq4349.png-wh_50 "/>

Written at the end:

    1. If there is a problem when you add a node or delete a node, first remove the slave hadoop.tmp.dir, and then reboot to try, if still not, Attempts to delete Master's Hadoop.tmp.dir (meaning that the data on the DFS is also lost) then needs to be namenode–format again.

    2. If there is any error message remember to check the log logs, the file location in the Hadoop installation directory Logs folder.

This article is from the "XUJPXM" blog, make sure to keep this source http://xujpxm.blog.51cto.com/8614409/1896964

Hadoop cluster Installation--ubuntu

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.