Hadoop single-node Environment Construction

Last Update:2015-09-20 Source: Internet

Author: User

Tags hdfs dfs hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop single-node Environment Construction

The following describes how to set and configure a single-node Hadoop on Linux, so that you can use Hadoop MapReduce and HDFS (Hadoop Distributed File System) for some simple operations.

Preparation 1) download Hadoop;
2) install JDK for your linux system, the recommended JDK version can be viewed here (http://wiki.apache.org/hadoop/HadoopJavaVersions;
3) Install ssh for your system. Set environment variable 1) set JDK information for Hadoop:
Export JAVA_HOME =/usr/java/latest
2) decompress Hadoop to a directory, such as the/usr/test directory.
Then edit the file/etc/profile to add:
Export HADOOP_INSTALL =/usr/test/hadoop-2.7.1
Export PATH = $ PATH: $ HADOOP_INSTALL/bin
Save the file, and then use the command source/etc/profile to re-compile and make the configuration take effect.
Run the following command. If the configuration is correct, the Hadoop version information is correctly output:
Hadoop version single-node mode by default, Hadoop has been configured to single-node mode, so no additional configuration is required.
The following example shows how to create an input directory, put some files, and run Hadoop:

$ Mkdir input
$ Cp etc/hadoop/*. xml input
$ Bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep input output 'dfs [a-z.] +'
$ Cat output /*

Pseudo-distributed Hadoop can also be run in a pseudo-distributed environment. Each Hadoop node is an independent Java Process. The configuration files to be configured include:
Etc/hadoop/core-site.xml:

<Configuration>
<Property>
<Name> fs. defaultFS </name>
<Value> hdfs :/// localhost: 9000 </value>
</Property>
</Configuration>

Etc/hadoop/hdfs-site.xml:

<Configuration>
<Property>
<Name> dfs. replication </name>
<Value> 1 </value>
</Property>
</Configuration>

Set ssh to log on without a key. Use the following method to check whether you can access ssh without a key:

$ Ssh localhost

If you cannot access data without a key, run the following command:

$ Ssh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa
$ Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys
$ Export HADOOP \ _ PREFIX =/usr/local/hadoop

Run a local MaReduce task.

1) format the File System

$ Bin/hdfs namenode-format

2) Enable the NameNode and DataNode genie processes.

If the Error "localhost: Error: JAVA_HOME is not set and cocould not be found" appears ", you can configure "export JAVA_HOME =/usr/java/latest" directly in the libexec/hadoop-config.sh ".
The hadoop genie process logs are recorded in the $ HADOOP_LOG_DIR folder. The default value is $ HADOOP_HOME/logs.
3) view the NameNode web interface. The default value is:

-NameNode-http: // localhost: 50070/

4) Specify the HDFS folder used to execute MapReduce tasks

$ Bin/hdfs dfs-mkdir/user
$ Bin/hdfs dfs-mkdir/user/<username>

5) copy the input file to the Distributed File System

$ Bin/hdfs dfs-put etc/hadoop input

Input must be created on the hdfs File System

6) Running examples

$ Bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep input output 'dfs [a-z.] +'

Note that the input and output correspond to the hdfs folder.

7) Check output files: copy the output files from the Distributed File System to the local file system and check them.

$ Bin/hdfs dfs-get output
$ Cat output /*

You can also view the output file directly on the Distributed File System:

$ Bin/hdfs dfs-cat output /*

8) when you finish, stop all the genie processes.

$ Sbin/stop-dfs.sh

For a single node YARN, you can use YARN to run a MapReduce task in pseudo-distributed mode. You need to set some parameters and run the ResourceManager and NodeManager genie processes.
Assume that you have done 1 ~ in the previous section ~ Step 4, then do the following steps: 1) configure the etc/hadoop/mapred-site.xml parameters as follows:

<Configuration>
<Property>
<Name> mapreduce. framework. name </name>
<Value> yarn </value>
</Property>
</Configuration>

Configure the etc/hadoop/yarn-site.xml parameters as follows:

<Configuration>
<Property>
<Name> yarn. nodemanager. aux-services </name>
<Value> mapreduce_shuffle </value>
</Property>
</Configuration>

2) Start the ResourceManager and NodeManager genie processes.

$ Sbin/start-yarn.sh

3) view the web interface of ResourceManager. The default value is:

-ResourceManager-http: // localhost: 8088/

4) run a MapReduce task

5) when you finish, stop all the genie processes:

$ Sbin/stop-yarn.sh

You may also like the following articles about Hadoop:

Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More