First, install Java
1. Download the jdk-8u91-linux-x64.tar.gz file at:http://www.oracle.com/technetwork/java/javase/downloads/index.html
2. Installation:
#选择一个安装路径, I chose/opt and copied the downloaded jdk-8u91-linux-x64.tar.gz file to this folder
$ cd/opt
$ sudo cp ~/downloads/jdk-8u91-linux-x64.tar.gz-i/opt/
#解压, installation
$ sudo tar zxvf jdk-8u91-linux-x64.tar.gz
$ sudo rm-r jdk-8u91-linux-x64.tar.gz
#检查是否安装成功
Ii. creating Hadoop groups and Hadoop users
1. Adding a Hadoop user to a system user
$ sudo addgroup Hadoop
$ sudo adduser--ingroup Hadoop hduser
2. Give the Hadoop user RS
Add hduser all= (all:all) all under root all= (All:all) all
Such as:
Third, configure SSH
So that each machine executes instructions without entering a login password, the master node will need to manually enter this password each time it attempts to access another node.
1. Install SSH
$ sudo apt-get install Openssh-server
2. Start the service
$ sudo/etc/init.d/ssh Start
3. After booting, you can see if the service starts correctly by following the command
$ ps-e |grep SSH
4. Generate the public and private keys:
$ ssh-keygen-y-T Rsa-p ""
Two files are generated under/home/hduser/.ssh: Id_rsa and Id_rsa.pub, which is the private key and the latter is the public key.
5. Now we append the public key to the Authorized_keys
$ cat ~/.ssh/id_rsa.pub>> ~/.ssh/authorized_keys
6. Log in to SSH and confirm that you don't need to enter a password
SSH localhost
7. Log Out
Exit
If you log in again, you don't need a password.
Iv. installation of Hadoop
1. First download to https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/ Hadoop-2.7.2.tar.gz
2. Unzip and place in the directory you want. I put it in the/usr/local/hadoop.
$ sudo tar xzf hadoop-2.7.2.tar.gz
$ sudo mv Hadoop-2.7.2/usr/local/hadoop
3. To ensure that all operations are done under user hdsuer:
$ sudo chown-r hduser:hadoop/usr/local/hadoop
V. Configuration ~/.BASHRC
1. Switch to Hadoop user, mine is HDUser
$ su-hduser
2. View the Java installation path
Update-alternatives--config Java
The complete path is:/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java
We only take the previous part/USR/LIB/JVM/JAVA-7-OPENJDK-AMD64
3. Modify the configuration file BASHRC
$ sudo gedit ~/.BASHRC
#在文件末尾追加下面内容
#HADOOP VARIABLES START
Export JAVA_HOME=/USR/LIB/JVM/JAVA-7-OPENJDK-AMD64
Export Hadoop_install=/usr/local/hadoop
Export path= $PATH: $HADOOP _install/bin
Export path= $PATH: $HADOOP _install/sbin
Export Hadoop_mapred_home= $HADOOP _install
Export Hadoop_common_home= $HADOOP _install
Export Hadoop_hdfs_home= $HADOOP _install
Export Yarn_home= $HADOOP _install
Export hadoop_common_lib_native_dir= $HADOOP _install/lib/native
Export hadoop_opts= "-djava.library.path= $HADOOP _install/lib"
#HADOOP VARIABLES END
4. Modify/usr/local/hadoop/etc/hadoop/hadoop-env.sh
$ sudo gedit/usr/local/hadoop/etc/hadoop/hadoop-env.sh
Locate the Java_home variable and modify the variable as follows
Export JAVA_HOME=/USR/LIB/JVM/JAVA-7-OPENJDK-AMD64
at this point, the standalone mode configuration is complete , WordCount test is performed below
VI. WORDCOUNT Test
1. First create a new folder in the Hadoop directory input
$ cd/usr/local/hadoop/
$ mkdir Input
2. Copy the README.txt file to the input folder to count the frequency of the words in the file
$ sudo cp README.txt input
3. Run the WordCount program, and save the output in the Outputs folder
#每次重新执行wordcount程序的时候, you need to delete the output folder first! Otherwise, it will go wrong
$ bin/hadoop Jar Share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.2-sources.jar Org.apache.hadoop.examples.WordCount Input Output
4. View character Statistics results
$ cat output/*
VII. pseudo Distribution Mode configuration
1. Modify 2 configuration Files Core-site.xml and Hdfs-site.xml, the configuration file is located in/usr/local/hadoop/etc/hadoop/
Start by creating several folders in the Hadoop directory:
$ cd/usr/local/hadoop
$ mkdir tmp
$ mkdir Tmp/dfs
$ mkdir Tmp/dfs/data
$ mkdir Tmp/dfs/name
Modify Core-site.xml:
$ sudo gedit etc/hadoop/core-site.xml
modified to the following configuration:
<configuration>
<property>
&NBSP;<NAME>HADOOP.TMP.DIR</NAME>
& Nbsp;<value>file:/usr/local/hadoop/tmp</value>
< Description>abase for other temporary directories.</description>
</property
<property>
<name> Fs.defaultfs</name>
<value>hdfs://localhost:9000</ Value>
</property>
</configuration>
Modify hdfs-site.xml:
$ sudo gedit etc/hadoop/hdfs-site.xml
Modify to the following configuration:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
2. Format of the execution Namenode
./bin/hdfs Namenode-format
Attention! You only need to format the Hadoop cluster when you just created it, and you can't format a running Hadoop file system (HDFS) , or you'll lose data!!
successful, you will see "successfully formatted" and "Exitting with status 0" prompt , if "Exitting with status 1" is an error.
3. Start Hadoop
Execute start-all.sh to start all services, including Namenode,datanode.
$ start-all.sh
Here, if error:cannot find configuration directory:/etc/hadoop appears, it is resolved by the following method :
Configure a directory for Hadoop configuration files in hadoop-env.sh
$ sudo gedit etc/hadoop/hadoop-env.sh
Plus export Hadoop_conf_dir=/usr/local/hadoop/etc/hadoop
After modifications such as:
$ source Etc/hadoop/hadoop-env.sh
Just start all the services again.
$ start-all.sh
The following WARN prompt may appear at startup: WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable. the WARN hint can be ignored , and will not affect normal use
4. Use the JPS command to determine if the startup is successful:
After this occurs, search the computer for JPS, because my Java installation path is:/opt/jdk1.8.0_91, so JPS is located in:/opt/jdk1.8.0_91/bin
$ cd/opt/jdk1.8.0_91/bin
$./jps
If successful, the following processes are listed: "NameNode", "DataNode", and "Secondarynamenode"
5. View HDFs information through the Web interface
Go to http://localhost:50070/to view
If the http://localhost:50070/cannot be loaded, it may be resolved by the following method:
First formatting of the execution Namenode
$./bin/hdfs Namenode-format
When prompted to enter y/n, be sure to enter the upper case y!!!
When prompted to enter y/n, be sure to enter the upper case y!!!
When prompted to enter y/n, be sure to enter the upper case y!!!
again execute start-all.sh to start all services
$ start-all.sh
Then execute the JPS command
$ cd/opt/jdk1.8.0_91/bin
$./jps
go to URL http://localhost:50070/again and it will load normally.
6. Stop running Hadoop
$ stop-all.sh
The prompt for no datanode to stop appears:
Workaround:
After stop-all.sh, delete all content under/tmp/dfs/data and/tmp/dfs/name, as shown, with a current folder:
So just delete the current folder
After deletion, the Namenode, start all services start-all.sh, and stop stop-all.sh, you can normal stop datanode.
Hadoop installation & Standalone/pseudo-distributed configuration _hadoop2.7.2/ubuntu14.04