Hadoop-2.7.3 single node mode installation

Source: Internet
Author: User
Tags memory usage mkdir hdfs dfs hadoop mapreduce

Original: http://blog.anxpp.com/index.php/archives/1036/ Hadoop single node mode installation

Official Tutorials: http://hadoop.apache.org/docs/r2.7.3/

This article is based on: Ubuntu 16.04, Hadoop-2.7.3 One, overview

This article refers to the official documentation for the installation of Hadoop single node mode (local mode and pseudo distributed mode) (Setting up A-Cluster). 1. Three modes of Hadoop installation (1) stand-alone mode (standalone)

Stand-alone mode is the default mode for Hadoop. When Hadoop's source package was first decompressed, it was not able to understand the hardware installation environment, and the minimum configuration was conservatively chosen. All 3 XML files are empty in this default mode. When the configuration file is empty, Hadoop runs completely locally. Because there is no need to interact with other nodes, stand-alone mode does not use HDFS, nor does it load any Hadoop daemon. This model is mainly used to develop the application logic of debugging MapReduce program.

This procedure is generally not recommended for installation, and there is little information on the network. (2) pseudo-distribution pattern (pseudo-distributed mode)

Pseudo-distribution mode runs Hadoop on a "single node Cluster" where all daemons run on the same machine. This mode adds code debugging on top of stand-alone mode, allowing you to check memory usage, HDFS input output, and other daemon interactions.

For example Namenode,datanode,secondarynamenode,jobtracer,tasktracer These 5 processes, all can see on the cluster. (3) full distribution pattern (fully distributed mode)

The Hadoop daemon runs on a cluster.

It means that Namenode,jobtracer,secondarynamenode can be installed on master or installed separately on master. Slave node can see Datanode and Tasktracer 2, the purpose of this article

This article describes how to set up and configure a local mode and a single node pseudo distributed Hadoop installation to quickly perform simple operations using the Hadoop MapReduce and Hadoop Distributed File System (HDFS). 3, platform support

Hadoop supports gnu/linux as a development and production platform. Hadoop has been demonstrated on a 2000-node Gnu/linux cluster.

Windows is also a supported platform, but this article applies only to Linux. 4, the need for other software (prerequisites)

Ssh

Java II,hadoop download and installation

Official website: http://hadoop.apache.org/

Download: http://hadoop.apache.org/releases.html

First download the response from the website of Hadoop, and then unpack:

TAR-ZXVF hadoop-2.7.3.tar.gz

Modify folder name:

MV hadoop-3.7.3 Hadoop

Configure environment variables to edit profile files:

sudo gedit/etc/profile

Then append the following to the end of the file:

# Hadoop Export Hadoop_home=/usr/lib/java/hadoop export path=${hadoop_home}/bin: $PATH

Remember the effective configuration:

Source/etc/profile

To see if the installation was successful:

anxpp@ubuntu:~$ Hadoop version Hadoop2.7.3 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git-r BAA91F7C6BC9CB92BE5982DE4719C1C8AF91CCFF compiledby Root on 2016-08-18t01:41z compiledwith Protoc 2.5.0 from source with Checksum 2E4CE5F957EA4DB193BCE3734FF29FF4 This command is run using/usr/lib/java/hadoop/share/hadoop/common/ Hadoop-common-2.7.3.jar Iii. Preparatory work before the cluster 1. Configure hadoop/etc/hadoop/hadoop-env.sh files

Comment out the 25-line #export Java_home=${java_home} and add it later:

Export java_home=/usr/lib/java/jdk1.8.0_111

Now you can enter the command test, where you use Hadoop/bin/hadoop:

anxpp@ubuntu:/$/usr/lib/java/hadoop/bin/hadoop

The use document for the Hadoop script is displayed.

You can now start with one of the three supported modes: ①local (Standalone) mode: local (standalone) modes ②pseudo-distributed: pseudo-distribution Mode ③fully-distributed mode: Full distribution pattern Four, use of local mode

By default, Hadoop is configured to run in non-distributed mode as a single Java process. This is useful for debugging.

The following example copies the uncompressed Conf directory to use as input, and then finds and displays each occurrence of the given regular expression, and the output is written to the given output directory:

$ mkdir Input $ CP etc/hadoop/*.xml input $ bin/hadoop jar Share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar GRE P Input Output ' dfs[a-z.] + ' $ cat output/* v. the use of pseudo-distributed

Hadoop can also run on a single node in pseudo distributed mode, where each Hadoop daemon runs in a separate Java process. 1, configuration

The configuration is described as follows: (1)etc/hadoop/core-site.xml

<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000 </value> </property> </configuration> (2)etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </ Property> </configuration> 2, the configuration ssh Login-free

First check to see if SSH needs a password locally:

$ ssh localhost

If the execution requires a password, execute the following command:

$ ssh-keygen-t rsa-p ' F ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys $ chmod 0600~/.ssh/author Ized_keys 3, operation

The following instructions are to run mapreduce locally. Performing some actions on the yarn will be described later in this section. (1) format file system

$/usr/lib/java/hadoop/bin/hdfs Namenode-format (2) start the Namenode daemon and the Datanode daemon
$/usr/lib/java/hadoop/sbin/start-dfs.sh

The HADOOP daemon log output will be written to the $ Hadoop_log_dir directory (default is $ hadoop_home/logs) (3) browse Namenode Web interface

By default, the address is:

namenode-http://localhost:50070/ (4) Create the HDFs directory required to perform the MapReduce job

$/usr/lib/java/hadoop/bin/hdfs Dfs-mkdir/user $/usr/lib/java/hadoop/bin/hdfs dfs-mkdir/user/<username> (5) Copy the input file to the Distributed File system
$/usr/lib/java/hadoop/bin/hdfs dfs-put Etc/hadoop input (6) running the sample
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input Output ' dfs[a-z.] +' (7) Check output file

Copy the output files from the Distributed file system to the local file system and check them:

$ bin/hdfs dfs-get Output output $ cat output/*

You can also view the output files on the Distributed File system:

$ Bin/hdfs Dfs-cat output/* (8) Stop the daemon process
$ sbin/stop-dfs.sh 4. yarn Configuration on single node

You can run MapReduce jobs based on yarn in pseudo distributed mode by setting some parameters and running the ResourceManager daemon and the NodeManager daemon.

The following operation assumes that the (1) ~ (4) step of the above instruction has been executed. (1) parameter configuration ①etc/hadoop/mapred-site.xml

<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</ Value> </property> </configuration> ②etc/hadoop/yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value> Mapreduce_shuffle</value> </property> </configuration> (2) start the ResourceManager daemon and the NodeManager daemon <

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.