Hadoop-2.7.3 single node mode installation

Last Update:2018-07-26 Source: Internet

Author: User

Tags memory usage mkdir hdfs dfs hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original: http://blog.anxpp.com/index.php/archives/1036/ Hadoop single node mode installation

Official Tutorials: http://hadoop.apache.org/docs/r2.7.3/

This article is based on: Ubuntu 16.04, Hadoop-2.7.3 One, overview

This article refers to the official documentation for the installation of Hadoop single node mode (local mode and pseudo distributed mode) (Setting up A-Cluster). 1. Three modes of Hadoop installation (1) stand-alone mode (standalone)

Stand-alone mode is the default mode for Hadoop. When Hadoop's source package was first decompressed, it was not able to understand the hardware installation environment, and the minimum configuration was conservatively chosen. All 3 XML files are empty in this default mode. When the configuration file is empty, Hadoop runs completely locally. Because there is no need to interact with other nodes, stand-alone mode does not use HDFS, nor does it load any Hadoop daemon. This model is mainly used to develop the application logic of debugging MapReduce program.

This procedure is generally not recommended for installation, and there is little information on the network. (2) pseudo-distribution pattern (pseudo-distributed mode)

Pseudo-distribution mode runs Hadoop on a "single node Cluster" where all daemons run on the same machine. This mode adds code debugging on top of stand-alone mode, allowing you to check memory usage, HDFS input output, and other daemon interactions.

For example Namenode,datanode,secondarynamenode,jobtracer,tasktracer These 5 processes, all can see on the cluster. (3) full distribution pattern (fully distributed mode)

The Hadoop daemon runs on a cluster.

It means that Namenode,jobtracer,secondarynamenode can be installed on master or installed separately on master. Slave node can see Datanode and Tasktracer 2, the purpose of this article

This article describes how to set up and configure a local mode and a single node pseudo distributed Hadoop installation to quickly perform simple operations using the Hadoop MapReduce and Hadoop Distributed File System (HDFS). 3, platform support

Hadoop supports gnu/linux as a development and production platform. Hadoop has been demonstrated on a 2000-node Gnu/linux cluster.

Windows is also a supported platform, but this article applies only to Linux. 4, the need for other software (prerequisites)

Ssh

Java II,hadoop download and installation

Official website: http://hadoop.apache.org/

Download: http://hadoop.apache.org/releases.html

First download the response from the website of Hadoop, and then unpack:

TAR-ZXVF hadoop-2.7.3.tar.gz

Modify folder name:

MV hadoop-3.7.3 Hadoop

Configure environment variables to edit profile files:

sudo gedit/etc/profile

Then append the following to the end of the file:

# Hadoop Export Hadoop_home=/usr/lib/java/hadoop export path=${hadoop_home}/bin: $PATH

Remember the effective configuration:

Source/etc/profile

To see if the installation was successful:

anxpp@ubuntu:~$ Hadoop version Hadoop2.7.3 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git-r BAA91F7C6BC9CB92BE5982DE4719C1C8AF91CCFF compiledby Root on 2016-08-18t01:41z compiledwith Protoc 2.5.0 from source with Checksum 2E4CE5F957EA4DB193BCE3734FF29FF4 This command is run using/usr/lib/java/hadoop/share/hadoop/common/ Hadoop-common-2.7.3.jar Iii. Preparatory work before the cluster 1. Configure hadoop/etc/hadoop/hadoop-env.sh files

Comment out the 25-line #export Java_home=${java_home} and add it later:

Export java_home=/usr/lib/java/jdk1.8.0_111

Now you can enter the command test, where you use Hadoop/bin/hadoop:

anxpp@ubuntu:/$/usr/lib/java/hadoop/bin/hadoop

The use document for the Hadoop script is displayed.

You can now start with one of the three supported modes: ①local (Standalone) mode: local (standalone) modes ②pseudo-distributed: pseudo-distribution Mode ③fully-distributed mode: Full distribution pattern Four, use of local mode

By default, Hadoop is configured to run in non-distributed mode as a single Java process. This is useful for debugging.

The following example copies the uncompressed Conf directory to use as input, and then finds and displays each occurrence of the given regular expression, and the output is written to the given output directory:

$ mkdir Input $ CP etc/hadoop/*.xml input $ bin/hadoop jar Share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar GRE P Input Output ' dfs[a-z.] + ' $ cat output/* v. the use of pseudo-distributed

Hadoop can also run on a single node in pseudo distributed mode, where each Hadoop daemon runs in a separate Java process. 1, configuration

The configuration is described as follows: (1)etc/hadoop/core-site.xml

<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000 </value> </property> </configuration> (2)etc/hadoop/hdfs-site.xml

<configuration> <property> <name>dfs.replication</name> <value>1</value> </ Property> </configuration> 2, the configuration ssh Login-free

First check to see if SSH needs a password locally:

$ ssh localhost

If the execution requires a password, execute the following command:

$ ssh-keygen-t rsa-p ' F ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys $ chmod 0600~/.ssh/author Ized_keys 3, operation

The following instructions are to run mapreduce locally. Performing some actions on the yarn will be described later in this section. (1) format file system

$/usr/lib/java/hadoop/bin/hdfs Namenode-format (2) start the Namenode daemon and the Datanode daemon

$/usr/lib/java/hadoop/sbin/start-dfs.sh

The HADOOP daemon log output will be written to the $ Hadoop_log_dir directory (default is $ hadoop_home/logs) (3) browse Namenode Web interface

By default, the address is:

namenode-http://localhost:50070/ (4) Create the HDFs directory required to perform the MapReduce job

$/usr/lib/java/hadoop/bin/hdfs Dfs-mkdir/user $/usr/lib/java/hadoop/bin/hdfs dfs-mkdir/user/<username> (5) Copy the input file to the Distributed File system

$/usr/lib/java/hadoop/bin/hdfs dfs-put Etc/hadoop input (6) running the sample

$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input Output ' dfs[a-z.] +' (7) Check output file

Copy the output files from the Distributed file system to the local file system and check them:

$ bin/hdfs dfs-get Output output $ cat output/*

You can also view the output files on the Distributed File system:

$ Bin/hdfs Dfs-cat output/* (8) Stop the daemon process

$ sbin/stop-dfs.sh 4. yarn Configuration on single node

You can run MapReduce jobs based on yarn in pseudo distributed mode by setting some parameters and running the ResourceManager daemon and the NodeManager daemon.

The following operation assumes that the (1) ~ (4) step of the above instruction has been executed. (1) parameter configuration ①etc/hadoop/mapred-site.xml

<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</ Value> </property> </configuration> ②etc/hadoop/yarn-site.xml

<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value> Mapreduce_shuffle</value> </property> </configuration> (2) start the ResourceManager daemon and the NodeManager daemon <

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop-2.7.3 single node mode installation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop-2.7.3 single node mode installation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support