Original: http://blog.anxpp.com/index.php/archives/1036/ Hadoop single node mode installation
Official Tutorials: http://hadoop.apache.org/docs/r2.7.3/
This article is based on: Ubuntu 16.04, Hadoop-2.7.3 One, overview
This article refers to the official documentation for the installation of Hadoop single node mode (local mode and pseudo distributed mode) (Setting up A-Cluster). 1. Three modes of Hadoop installation (1) stand-alone mode (standalone)
Stand-alone mode is the default mode for Hadoop. When Hadoop's source package was first decompressed, it was not able to understand the hardware installation environment, and the minimum configuration was conservatively chosen. All 3 XML files are empty in this default mode. When the configuration file is empty, Hadoop runs completely locally. Because there is no need to interact with other nodes, stand-alone mode does not use HDFS, nor does it load any Hadoop daemon. This model is mainly used to develop the application logic of debugging MapReduce program.
This procedure is generally not recommended for installation, and there is little information on the network. (2) pseudo-distribution pattern (pseudo-distributed mode)
Pseudo-distribution mode runs Hadoop on a "single node Cluster" where all daemons run on the same machine. This mode adds code debugging on top of stand-alone mode, allowing you to check memory usage, HDFS input output, and other daemon interactions.
For example Namenode,datanode,secondarynamenode,jobtracer,tasktracer These 5 processes, all can see on the cluster. (3) full distribution pattern (fully distributed mode)
The Hadoop daemon runs on a cluster.
It means that Namenode,jobtracer,secondarynamenode can be installed on master or installed separately on master. Slave node can see Datanode and Tasktracer 2, the purpose of this article
This article describes how to set up and configure a local mode and a single node pseudo distributed Hadoop installation to quickly perform simple operations using the Hadoop MapReduce and Hadoop Distributed File System (HDFS). 3, platform support
Hadoop supports gnu/linux as a development and production platform. Hadoop has been demonstrated on a 2000-node Gnu/linux cluster.
Windows is also a supported platform, but this article applies only to Linux. 4, the need for other software (prerequisites)
Ssh
Java II,hadoop download and installation
Official website: http://hadoop.apache.org/
Download: http://hadoop.apache.org/releases.html
First download the response from the website of Hadoop, and then unpack:
TAR-ZXVF hadoop-2.7.3.tar.gz
Modify folder name:
MV hadoop-3.7.3 Hadoop
Configure environment variables to edit profile files:
sudo gedit/etc/profile
Then append the following to the end of the file:
# Hadoop Export Hadoop_home=/usr/lib/java/hadoop export path=${hadoop_home}/bin: $PATH
Remember the effective configuration:
Source/etc/profile
To see if the installation was successful:
anxpp@ubuntu:~$ Hadoop version Hadoop2.7.3 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git-r BAA91F7C6BC9CB92BE5982DE4719C1C8AF91CCFF compiledby Root on 2016-08-18t01:41z compiledwith Protoc 2.5.0 from source with Checksum 2E4CE5F957EA4DB193BCE3734FF29FF4 This command is run using/usr/lib/java/hadoop/share/hadoop/common/ Hadoop-common-2.7.3.jar
Iii. Preparatory work before the cluster
1. Configure hadoop/etc/hadoop/hadoop-env.sh files
Comment out the 25-line #export Java_home=${java_home} and add it later:
Export java_home=/usr/lib/java/jdk1.8.0_111
Now you can enter the command test, where you use Hadoop/bin/hadoop:
anxpp@ubuntu:/$/usr/lib/java/hadoop/bin/hadoop
The use document for the Hadoop script is displayed.
You can now start with one of the three supported modes: ①local (Standalone) mode: local (standalone) modes ②pseudo-distributed: pseudo-distribution Mode ③fully-distributed mode: Full distribution pattern Four, use of local mode
By default, Hadoop is configured to run in non-distributed mode as a single Java process. This is useful for debugging.
The following example copies the uncompressed Conf directory to use as input, and then finds and displays each occurrence of the given regular expression, and the output is written to the given output directory:
$ mkdir Input $ CP etc/hadoop/*.xml input $ bin/hadoop jar Share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar GRE P Input Output ' dfs[a-z.] + ' $ cat output/*
v. the use of pseudo-distributed
Hadoop can also run on a single node in pseudo distributed mode, where each Hadoop daemon runs in a separate Java process. 1, configuration
The configuration is described as follows: (1)etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000 </value> </property> </configuration>
(2)etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </ Property> </configuration>
2, the configuration ssh Login-free
First check to see if SSH needs a password locally:
$ ssh localhost
If the execution requires a password, execute the following command:
$ ssh-keygen-t rsa-p ' F ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys $ chmod 0600~/.ssh/author Ized_keys
3, operation
The following instructions are to run mapreduce locally. Performing some actions on the yarn will be described later in this section. (1) format file system
$/usr/lib/java/hadoop/bin/hdfs Namenode-format
(2) start the Namenode daemon and the Datanode daemon
$/usr/lib/java/hadoop/sbin/start-dfs.sh
The HADOOP daemon log output will be written to the $ Hadoop_log_dir directory (default is $ hadoop_home/logs) (3) browse Namenode Web interface
By default, the address is:
namenode-http://localhost:50070/ (4) Create the HDFs directory required to perform the MapReduce job
$/usr/lib/java/hadoop/bin/hdfs Dfs-mkdir/user $/usr/lib/java/hadoop/bin/hdfs dfs-mkdir/user/<username>
(5) Copy the input file to the Distributed File system
$/usr/lib/java/hadoop/bin/hdfs dfs-put Etc/hadoop input
(6) running the sample
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input Output ' dfs[a-z.] +'
(7) Check output file
Copy the output files from the Distributed file system to the local file system and check them:
$ bin/hdfs dfs-get Output output $ cat output/*
You can also view the output files on the Distributed File system:
$ Bin/hdfs Dfs-cat output/*
(8) Stop the daemon process
$ sbin/stop-dfs.sh
4. yarn Configuration on single node
You can run MapReduce jobs based on yarn in pseudo distributed mode by setting some parameters and running the ResourceManager daemon and the NodeManager daemon.
The following operation assumes that the (1) ~ (4) step of the above instruction has been executed. (1) parameter configuration ①etc/hadoop/mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</ Value> </property> </configuration>
②etc/hadoop/yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value> Mapreduce_shuffle</value> </property> </configuration>
(2) start the ResourceManager daemon and the NodeManager daemon <