First, Introduction
After the completion of the storm's environment configuration, think about the installation of Hadoop, online tutorial a lot of, but not a particularly suitable, so in the process of installation still encountered a lot of trouble, and finally constantly consult the data, finally solved the problem, feeling is very good, the following nonsense not much to say, Start getting to the chase.
The configuration environment of this machine is as follows:
Hadoop (2.7.1)
Ubuntu Linux (64-bit system)
Here are a few steps to explain the configuration process.
Second, the installation of SSH services
Enter the shell command, enter the following command to see if the SSH service is already installed, and if not, install it using the following command:
sudo apt-get install SSH openssh-server
The installation process is relatively easy and enjoyable.
Third, using SSH for password-free authentication login
1. Create Ssh-key, where we use the RSA method, using the following command:
Ssh-keygen-t rsa-p ""
2. A graphic appears, the graphic is the password, do not care about it
Cat ~/.ssh/id_rsa.pub >> Authorized_keys (seems to be omitted)
3. You can then login without password verification, as follows:
SSH localhost
Success is as follows:
Iv. Download the Hadoop installation package
There are two ways to download a Hadoop installation
1. Download the direct Officer Network, http://mirrors.hust.edu.cn/apache/hadoop/core/stable/hadoop-2.7.1.tar.gz
2. Use the shell to download the command as follows:
wget http://mirrors.hust.edu.cn/apache/hadoop/core/stable/hadoop-2.7.1.tar.gz
Looks like the second way to hurry, after a long wait, finally download completed.
V. Unzip the Hadoop installation package
Unzip the Hadoop installation package using the following command
TAR-ZXVF hadoop-2.7.1.tar.gz
hadoop2.7.1 folder appears after decompression is complete
Vi. Configuring the appropriate files in Hadoop
The files that need to be configured are as follows, Hadoop-env.sh,core-site.xml,mapred-site.xml.template,hdfs-site.xml, all files are located under Hadoop2.7.1/etc/hadoop, The following configuration is required:
The 1.core-site.xml configuration is as follows:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/leesf/program/hadoop/tmp</value>
<description>abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
One of the Hadoop.tmp.dir's paths can be set according to your own habits.
The 2.mapred-site.xml.template configuration is as follows:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
The 3.hdfs-site.xml configuration is as follows:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/leesf/program/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/leesf/program/hadoop/tmp/dfs/data</value>
</property>
</configuration>
The paths of Dfs.namenode.name.dir and Dfs.datanode.data.dir can be set freely, preferably under the Hadoop.tmp.dir directory.
Add that if you run Hadoop and find that you can't find the JDK, you can simply place the JDK's path inside the hadoop.env.sh, as follows:
Export Java_home= "/home/leesf/program/java/jdk1.8.0_60"
Vii. running Hadoop
After the configuration is complete, run Hadoop.
1. Initializing the HDFS system
Use the following command in the hadop2.7.1 directory:
Bin/hdfs Namenode-format
As follows:
The procedure requires SSH authentication and is already logged in, so type y between the initialization process.
Success is as follows:
Indicates that the initialization has been completed.
2. Opening NameNode
and DataNode
daemon processes
Use the following command to open:
sbin/start-dfs.sh,成功的如下:
3. View process Information
Use the following command to view process information
JPS, as follows:
Indicates data Datanode and Namenode have been turned on
4. View the Web UI
Enter http://localhost:50070 in the browser to see the relevant information, as follows:
At this point, the Hadoop environment has been built. Let's start with Hadoop to run a wordcount example.
Eight, run WordCount Demo
1. Create a new file locally, the author in the HOME/LEESF directory to create a new words document, the contents can be filled in.
2. Create a new folder in HDFs to upload the local words document, and enter the following command in the hadoop2.7.1 directory:
Bin/hdfs Dfs-mkdir/test, indicating that a test directory was established under the root directory of HDFs
Use the following command to view the directory structure under the HDFs root directory
Bin/hdfs Dfs-ls/
Specific as follows:
Indicates that a test directory has been created in the root directory of HDFs
3. Upload the local words document to the test directory
Use the following command to upload the operation:
Bin/hdfs dfs-put/home/leesf/words/test/
Use the following command to view
Bin/hdfs dfs-ls/test/
The results are as follows:
Indicates that the local words document has been uploaded to the test directory.
4. Running WordCount
Run WordCount using the following command:
Bin/hadoop jar Share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar Wordcount/test/words/test/out
As follows:
After the run is complete, generate a file named out in the/test directory, and use the following command to view the files in the/test directory
Bin/hdfs Dfs-ls/test
As follows:
Indicates that a file directory named out is already in the test directory
Enter the following command to view the files in the Out directory:
Bin/hdfs Dfs-ls/test/out, the results are as follows:
Indicates that it has been successfully run and the result is saved in part-r-00000.
5. View running Results
Use the following command to view the results of the operation:
Bin/hadoop fs-cat/test/out/part-r-00000
The results are as follows:
At this point, the running process is complete.
Summary: In this Hadoop configuration process encountered a lot of problems, hadoop1.x and 2.x command is very different, the configuration process is still one by one to solve the problem, the configuration is successful, the harvest is also a lot, I hereby share the experience of this configuration, but also convenient to configure the Hadoop environment of the friends, in the configuration of the process have any questions are welcome to discuss, thank you, see the end.
The reference links are as follows:
Http://www.linuxidc.com/Linux/2015-02/113487.htm
Http://www.cnblogs.com/madyina/p/3708153.html
Linux installation of Hadoop (2.7.1) detailed and WordCount operation