Linux installation of Hadoop (2.7.1) detailed and WordCount operation

Last Update:2015-09-09 Source: Internet

Author: User

Tags hdfs dfs hadoop fs

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, Introduction

After the completion of the storm's environment configuration, think about the installation of Hadoop, online tutorial a lot of, but not a particularly suitable, so in the process of installation still encountered a lot of trouble, and finally constantly consult the data, finally solved the problem, feeling is very good, the following nonsense not much to say, Start getting to the chase.

The configuration environment of this machine is as follows:

Hadoop (2.7.1)

Ubuntu Linux (64-bit system)

Here are a few steps to explain the configuration process.

Second, the installation of SSH services

Enter the shell command, enter the following command to see if the SSH service is already installed, and if not, install it using the following command:

sudo apt-get install SSH openssh-server

The installation process is relatively easy and enjoyable.

Third, using SSH for password-free authentication login

1. Create Ssh-key, where we use the RSA method, using the following command:

Ssh-keygen-t rsa-p ""

2. A graphic appears, the graphic is the password, do not care about it

Cat ~/.ssh/id_rsa.pub >> Authorized_keys (seems to be omitted)

3. You can then login without password verification, as follows:

SSH localhost

Success is as follows:

Iv. Download the Hadoop installation package

There are two ways to download a Hadoop installation

1. Download the direct Officer Network, http://mirrors.hust.edu.cn/apache/hadoop/core/stable/hadoop-2.7.1.tar.gz

2. Use the shell to download the command as follows:

wget http://mirrors.hust.edu.cn/apache/hadoop/core/stable/hadoop-2.7.1.tar.gz

Looks like the second way to hurry, after a long wait, finally download completed.

V. Unzip the Hadoop installation package

Unzip the Hadoop installation package using the following command

TAR-ZXVF hadoop-2.7.1.tar.gz

hadoop2.7.1 folder appears after decompression is complete

Vi. Configuring the appropriate files in Hadoop

The files that need to be configured are as follows, Hadoop-env.sh,core-site.xml,mapred-site.xml.template,hdfs-site.xml, all files are located under Hadoop2.7.1/etc/hadoop, The following configuration is required:

The 1.core-site.xml configuration is as follows:

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/leesf/program/hadoop/tmp</value>
<description>abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

One of the Hadoop.tmp.dir's paths can be set according to your own habits.

The 2.mapred-site.xml.template configuration is as follows:

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>

The 3.hdfs-site.xml configuration is as follows:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/leesf/program/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/leesf/program/hadoop/tmp/dfs/data</value>
</property>
</configuration>

The paths of Dfs.namenode.name.dir and Dfs.datanode.data.dir can be set freely, preferably under the Hadoop.tmp.dir directory.

Add that if you run Hadoop and find that you can't find the JDK, you can simply place the JDK's path inside the hadoop.env.sh, as follows:

Export Java_home= "/home/leesf/program/java/jdk1.8.0_60"

Vii. running Hadoop

After the configuration is complete, run Hadoop.

1. Initializing the HDFS system

Use the following command in the hadop2.7.1 directory:

Bin/hdfs Namenode-format

As follows:

The procedure requires SSH authentication and is already logged in, so type y between the initialization process.

Success is as follows:

Indicates that the initialization has been completed.

2. Opening NameNode and DataNode daemon processes

Use the following command to open:

　　　　sbin/start-dfs.sh，成功的如下：

3. View process Information

Use the following command to view process information

JPS, as follows:

Indicates data Datanode and Namenode have been turned on

4. View the Web UI

Enter http://localhost:50070 in the browser to see the relevant information, as follows:

At this point, the Hadoop environment has been built. Let's start with Hadoop to run a wordcount example.

Eight, run WordCount Demo

1. Create a new file locally, the author in the HOME/LEESF directory to create a new words document, the contents can be filled in.

2. Create a new folder in HDFs to upload the local words document, and enter the following command in the hadoop2.7.1 directory:

Bin/hdfs Dfs-mkdir/test, indicating that a test directory was established under the root directory of HDFs

Use the following command to view the directory structure under the HDFs root directory

Bin/hdfs Dfs-ls/

Specific as follows:

Indicates that a test directory has been created in the root directory of HDFs

3. Upload the local words document to the test directory

Use the following command to upload the operation:

Bin/hdfs dfs-put/home/leesf/words/test/

Use the following command to view

Bin/hdfs dfs-ls/test/

The results are as follows:

Indicates that the local words document has been uploaded to the test directory.

4. Running WordCount

Run WordCount using the following command:

Bin/hadoop jar Share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar Wordcount/test/words/test/out

As follows:

After the run is complete, generate a file named out in the/test directory, and use the following command to view the files in the/test directory

Bin/hdfs Dfs-ls/test

As follows:

Indicates that a file directory named out is already in the test directory

Enter the following command to view the files in the Out directory:

Bin/hdfs Dfs-ls/test/out, the results are as follows:

Indicates that it has been successfully run and the result is saved in part-r-00000.

5. View running Results

Use the following command to view the results of the operation:

Bin/hadoop fs-cat/test/out/part-r-00000

The results are as follows:

At this point, the running process is complete.

Summary: In this Hadoop configuration process encountered a lot of problems, hadoop1.x and 2.x command is very different, the configuration process is still one by one to solve the problem, the configuration is successful, the harvest is also a lot, I hereby share the experience of this configuration, but also convenient to configure the Hadoop environment of the friends, in the configuration of the process have any questions are welcome to discuss, thank you, see the end.

The reference links are as follows:

Http://www.linuxidc.com/Linux/2015-02/113487.htm

Http://www.cnblogs.com/madyina/p/3708153.html

Linux installation of Hadoop (2.7.1) detailed and WordCount operation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More