First, install Java
1. Download jdk-8u91-linux-x64.tar.gz file, the website is: http://www.oracle.com/technetwork/java/javase/downloads/index.html
2. Installation:
#选择一个安装路径, I chose/opt and copied the downloaded jdk-8u91-linux-x64.tar.gz file to this folder
$ cd/opt
$ sudo cp ~/downloads/jdk-8u91-linux-x64.tar.gz-i/opt/
#解压, installation
$ sudo tar zxvf jdk-8u91-linux-x64.tar.gz
$ sudo rm-r jdk-8u91-linux-x64.tar.gz
#检查是否安装成功
Ii. creating Hadoop groups and Hadoop users
1. Add Hadoop user to System user
$ sudo addgroup Hadoop
$ sudo adduser--ingroup Hadoop hduser
2. Give Hadoop users RS
Add HDUser all= (All:all) all under root all= (All:all) all
The following figure:
Third, configure SSH
In order to execute the instructions between the machines without entering a login password, the master node will need to manually enter the password each time it attempts to access the other node.
1. Install SSH
$ sudo apt-get install Openssh-server
2. Start Service
$ sudo/etc/init.d/ssh Start
3. After startup, you can see if the service starts correctly by following the command
$ ps-e |grep SSH
4. Generate public and private keys:
$ ssh-keygen-y-T Rsa-p ""
At this point, two files are generated under/home/hduser/.ssh: Id_rsa and Id_rsa.pub, the former private key and the public key.
5. Now we append the public key to the Authorized_keys
$ cat ~/.ssh/id_rsa.pub>> ~/.ssh/authorized_keys
6. Login to SSH, confirm that you do not need to enter the password
SSH localhost
7. Log Out
Exit
If you log in again, you don't need a password.
Four, install Hadoop
1. First to https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/download hadoop-2.7.2.tar.gz
2. Unpack and place in the directory you want. I put it in the/usr/local/hadoop.
$ sudo tar xzf hadoop-2.7.2.tar.gz
$ sudo mv Hadoop-2.7.2/usr/local/hadoop
3. To ensure that all operations are done under user hdsuer:
$ sudo chown-r hduser:hadoop/usr/local/hadoop
v. Configuration of ~/.BASHRC
1. Switch to Hadoop user, mine is HDUser
$ su-hduser
2. View the Java installation path
Update-alternatives--config Java
The complete path is:/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java
We'll just take the front part/USR/LIB/JVM/JAVA-7-OPENJDK-AMD64
3. Modify configuration file BASHRC
$ sudo gedit ~/.BASHRC
#在文件末尾追加下面内容
#HADOOP VARIABLES START
Export JAVA_HOME=/USR/LIB/JVM/JAVA-7-OPENJDK-AMD64
Export Hadoop_install=/usr/local/hadoop
Export path= $PATH: $HADOOP _install/bin
Export path= $PATH: $HADOOP _install/sbin
Export Hadoop_mapred_home= $HADOOP _install
Export Hadoop_common_home= $HADOOP _install
Export Hadoop_hdfs_home= $HADOOP _install
Export Yarn_home= $HADOOP _install
Export hadoop_common_lib_native_dir= $HADOOP _install/lib/native
Export hadoop_opts= "-djava.library.path= $HADOOP _install/lib"
#HADOOP VARIABLES End
4. Modify/usr/local/hadoop/etc/hadoop/hadoop-env.sh
$ sudo gedit/usr/local/hadoop/etc/hadoop/hadoop-env.sh
Find the Java_home variable, modify this variable as follows
Export JAVA_HOME=/USR/LIB/JVM/JAVA-7-OPENJDK-AMD64
at this point, the stand-alone mode configuration is complete , the following wordcount test
Six, WordCount test
1. First new folder input in the Hadoop directory
$ cd/usr/local/hadoop/
$ mkdir Input
2. Copy the README.txt file to the input folder to count the frequency of the words in the file
$ sudo cp README.txt input
3. Run the WordCount program, and save the output in a print folder
#每次重新执行wordcount程序的时候, you need to delete the output folder first. Otherwise there will be an error .
$ bin/hadoop Jar Share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.2-sources.jar Org.apache.hadoop.examples.WordCount Input Output
4. View character Statistics results
$ cat output/*
VII. pseudo Distribution mode configuration
1. Modify 2 profiles Core-site.xml and Hdfs-site.xml, configuration files are located in/usr/local/hadoop/etc/hadoop/
Start by creating several folders in the Hadoop directory:
$ cd/usr/local/hadoop
$ mkdir tmp
$ mkdir Tmp/dfs
$ mkdir Tmp/dfs/data
$ mkdir Tmp/dfs/name
Modify Core-site.xml:
$ sudo gedit etc/hadoop/core-site.xml
Modify to the following configuration:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>abase for the other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Modify Hdfs-site.xml:
$ sudo gedit etc/hadoop/hdfs-site.xml
Modify to the following configuration:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
2. Perform namenode formatting
./bin/hdfs Namenode-format
attention. You only need to format the Hadoop cluster when you first create it, you cannot format a running Hadoop file system (HDFS), or you will lose data ...
successful, you will see the "successfully formatted" and "Exitting with status 0" prompts , if "Exitting with status 1" is an error.
3. Start Hadoop
Perform start-all.sh to start all services, including Namenode,datanode.
$ start-all.sh
Here, if there's error:cannot find configuration directory:/etc/hadoop, the following method resolves :
Configure a Hadoop configuration file directory in hadoop-env.sh
$ sudo gedit etc/hadoop/hadoop-env.sh
Plus export Hadoop_conf_dir=/usr/local/hadoop/etc/hadoop
After the modification, the following figure:
$ source Etc/hadoop/hadoop-env.sh
It's good to start all the services again.
$ start-all.sh
The following WARN prompts may appear at startup: WARN util. nativecodeloader:unable to load Native-hadoop the library for your platform ... using Builtin-java classes where applicable. the WARN hint can be ignored and does not affect normal use
4. To determine whether a successful startup is initiated through the JPS command:
After this happens, search the computer for JPS, because my Java installation path is:/opt/jdk1.8.0_91, so JPS is located at:/opt/jdk1.8.0_91/bin
$ cd/opt/jdk1.8.0_91/bin
$./jps
Successful startup will list the following processes: "Namenode", "Datanode" and "Secondarynamenode"
5. View HDFs information through the Web interface
Go to http://localhost:50070/to view
if http://localhost:50070/cannot be loaded, it may be resolved in the following way:
To perform namenode formatting first
$./bin/hdfs Namenode-format
Be sure to enter capital Y when prompted to enter y/n ...
Be sure to enter capital Y when prompted to enter y/n ...
Be sure to enter capital Y when prompted to enter y/n ...
Then execute start-all.sh to start all services
$ start-all.sh
And then execute the JPS command.
$ cd/opt/jdk1.8.0_91/bin
$./jps again to the URL http://localhost:50070/, you can load the normal.
6. Stop running Hadoop
$ stop-all.sh
A prompt for no datanode to stop appears:
Workaround:
After stop-all.sh, delete all content under/tmp/dfs/data and/tmp/dfs/name, as shown in the following illustration, which contains a current folder:
So just delete the current folder
After the deletion, format the Namenode again, start all the service start-all.sh, and stop the stop-all.sh, you can normally stop datanode.