Hadoop installation & Standalone/pseudo-distributed configuration _hadoop2.7.2/ubuntu14.04

Source: Internet
Author: User
Tags gz file

First, install Java

1. Download the jdk-8u91-linux-x64.tar.gz file at:http://www.oracle.com/technetwork/java/javase/downloads/index.html

2. Installation:

#选择一个安装路径, I chose/opt and copied the downloaded jdk-8u91-linux-x64.tar.gz file to this folder

$ cd/opt

$ sudo cp ~/downloads/jdk-8u91-linux-x64.tar.gz-i/opt/

#解压, installation

$ sudo tar zxvf jdk-8u91-linux-x64.tar.gz

$ sudo rm-r jdk-8u91-linux-x64.tar.gz
#检查是否安装成功


Ii. creating Hadoop groups and Hadoop users

1. Adding a Hadoop user to a system user
$ sudo addgroup Hadoop
$ sudo adduser--ingroup Hadoop hduser
2. Give the Hadoop user RS

Add hduser all= (all:all) all under root all= (All:all) all

Such as:


Third, configure SSH

So that each machine executes instructions without entering a login password, the master node will need to manually enter this password each time it attempts to access another node.

1. Install SSH
$ sudo apt-get install Openssh-server

2. Start the service

$ sudo/etc/init.d/ssh Start

3. After booting, you can see if the service starts correctly by following the command

$ ps-e |grep SSH


4. Generate the public and private keys:

$ ssh-keygen-y-T Rsa-p ""

Two files are generated under/home/hduser/.ssh: Id_rsa and Id_rsa.pub, which is the private key and the latter is the public key.

5. Now we append the public key to the Authorized_keys
$ cat ~/.ssh/id_rsa.pub>> ~/.ssh/authorized_keys

6. Log in to SSH and confirm that you don't need to enter a password

SSH localhost


7. Log Out
Exit

If you log in again, you don't need a password.

Iv. installation of Hadoop

1. First download to https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/ Hadoop-2.7.2.tar.gz

2. Unzip and place in the directory you want. I put it in the/usr/local/hadoop.

$ sudo tar xzf hadoop-2.7.2.tar.gz
$ sudo mv Hadoop-2.7.2/usr/local/hadoop

3. To ensure that all operations are done under user hdsuer:
$ sudo chown-r hduser:hadoop/usr/local/hadoop

V. Configuration ~/.BASHRC

1. Switch to Hadoop user, mine is HDUser

$ su-hduser

2. View the Java installation path
Update-alternatives--config Java


The complete path is:/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java
We only take the previous part/USR/LIB/JVM/JAVA-7-OPENJDK-AMD64

3. Modify the configuration file BASHRC

$ sudo gedit ~/.BASHRC

#在文件末尾追加下面内容


#HADOOP VARIABLES START
Export JAVA_HOME=/USR/LIB/JVM/JAVA-7-OPENJDK-AMD64
Export Hadoop_install=/usr/local/hadoop
Export path= $PATH: $HADOOP _install/bin
Export path= $PATH: $HADOOP _install/sbin
Export Hadoop_mapred_home= $HADOOP _install
Export Hadoop_common_home= $HADOOP _install
Export Hadoop_hdfs_home= $HADOOP _install
Export Yarn_home= $HADOOP _install
Export hadoop_common_lib_native_dir= $HADOOP _install/lib/native
Export hadoop_opts= "-djava.library.path= $HADOOP _install/lib"
#HADOOP VARIABLES END

4. Modify/usr/local/hadoop/etc/hadoop/hadoop-env.sh

$ sudo gedit/usr/local/hadoop/etc/hadoop/hadoop-env.sh

Locate the Java_home variable and modify the variable as follows
Export JAVA_HOME=/USR/LIB/JVM/JAVA-7-OPENJDK-AMD64
at this point, the standalone mode configuration is complete , WordCount test is performed below

VI. WORDCOUNT Test

1. First create a new folder in the Hadoop directory input

$ cd/usr/local/hadoop/
$ mkdir Input

2. Copy the README.txt file to the input folder to count the frequency of the words in the file
$ sudo cp README.txt input

3. Run the WordCount program, and save the output in the Outputs folder

#每次重新执行wordcount程序的时候, you need to delete the output folder first! Otherwise, it will go wrong

$ bin/hadoop Jar Share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.2-sources.jar Org.apache.hadoop.examples.WordCount Input Output


4. View character Statistics results
$ cat output/*


VII. pseudo Distribution Mode configuration

1. Modify 2 configuration Files Core-site.xml and Hdfs-site.xml, the configuration file is located in/usr/local/hadoop/etc/hadoop/

Start by creating several folders in the Hadoop directory:

$ cd/usr/local/hadoop
$ mkdir tmp
$ mkdir Tmp/dfs
$ mkdir Tmp/dfs/data
$ mkdir Tmp/dfs/name

Modify Core-site.xml:

$ sudo gedit etc/hadoop/core-site.xml

modified to the following configuration:
<configuration>
        <property>
            &NBSP;<NAME>HADOOP.TMP.DIR</NAME>
            & Nbsp;<value>file:/usr/local/hadoop/tmp</value>
             < Description>abase for other temporary directories.</description>
        </property
        <property>
             <name> Fs.defaultfs</name>
             <value>hdfs://localhost:9000</ Value>
        </property>
</configuration>

Modify hdfs-site.xml:

$ sudo gedit etc/hadoop/hdfs-site.xml

Modify to the following configuration:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>

2. Format of the execution Namenode

./bin/hdfs Namenode-format

Attention! You only need to format the Hadoop cluster when you just created it, and you can't format a running Hadoop file system (HDFS) , or you'll lose data!!

successful, you will see "successfully formatted" and "Exitting with status 0" prompt , if "Exitting with status 1" is an error.


3. Start Hadoop

Execute start-all.sh to start all services, including Namenode,datanode.

$ start-all.sh

Here, if error:cannot find configuration directory:/etc/hadoop appears, it is resolved by the following method :

Configure a directory for Hadoop configuration files in hadoop-env.sh

$ sudo gedit etc/hadoop/hadoop-env.sh

Plus export Hadoop_conf_dir=/usr/local/hadoop/etc/hadoop

After modifications such as:


$ source Etc/hadoop/hadoop-env.sh

Just start all the services again.

$ start-all.sh


The following WARN prompt may appear at startup: WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable. the WARN hint can be ignored , and will not affect normal use

4. Use the JPS command to determine if the startup is successful:


After this occurs, search the computer for JPS, because my Java installation path is:/opt/jdk1.8.0_91, so JPS is located in:/opt/jdk1.8.0_91/bin

$ cd/opt/jdk1.8.0_91/bin

$./jps

If successful, the following processes are listed: "NameNode", "DataNode", and "Secondarynamenode"


5. View HDFs information through the Web interface

Go to http://localhost:50070/to view

If the http://localhost:50070/cannot be loaded, it may be resolved by the following method:

First formatting of the execution Namenode

$./bin/hdfs Namenode-format

When prompted to enter y/n, be sure to enter the upper case y!!!

When prompted to enter y/n, be sure to enter the upper case y!!!

When prompted to enter y/n, be sure to enter the upper case y!!!

again execute start-all.sh to start all services

$ start-all.sh

Then execute the JPS command

$ cd/opt/jdk1.8.0_91/bin

$./jps

go to URL http://localhost:50070/again and it will load normally.

6. Stop running Hadoop

$ stop-all.sh

The prompt for no datanode to stop appears:


Workaround:

After stop-all.sh, delete all content under/tmp/dfs/data and/tmp/dfs/name, as shown, with a current folder:

So just delete the current folder



After deletion, the Namenode, start all services start-all.sh, and stop stop-all.sh, you can normal stop datanode.


Hadoop installation & Standalone/pseudo-distributed configuration _hadoop2.7.2/ubuntu14.04

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.