Hadoop installation & stand-alone/pseudo distributed configuration _hadoop2.7.2/ubuntu14.04

Source: Internet
Author: User
Tags gz file ssh

First, install Java

1. Download jdk-8u91-linux-x64.tar.gz file, the website is: http://www.oracle.com/technetwork/java/javase/downloads/index.html

2. Installation:

#选择一个安装路径, I chose/opt and copied the downloaded jdk-8u91-linux-x64.tar.gz file to this folder

$ cd/opt

$ sudo cp ~/downloads/jdk-8u91-linux-x64.tar.gz-i/opt/

#解压, installation

$ sudo tar zxvf jdk-8u91-linux-x64.tar.gz

$ sudo rm-r jdk-8u91-linux-x64.tar.gz
#检查是否安装成功


Ii. creating Hadoop groups and Hadoop users

1. Add Hadoop user to System user
$ sudo addgroup Hadoop
$ sudo adduser--ingroup Hadoop hduser
2. Give Hadoop users RS

Add HDUser all= (All:all) all under root all= (All:all) all

The following figure:


Third, configure SSH

In order to execute the instructions between the machines without entering a login password, the master node will need to manually enter the password each time it attempts to access the other node.

1. Install SSH
$ sudo apt-get install Openssh-server

2. Start Service

$ sudo/etc/init.d/ssh Start

3. After startup, you can see if the service starts correctly by following the command

$ ps-e |grep SSH


4. Generate public and private keys:

$ ssh-keygen-y-T Rsa-p ""

At this point, two files are generated under/home/hduser/.ssh: Id_rsa and Id_rsa.pub, the former private key and the public key.

5. Now we append the public key to the Authorized_keys
$ cat ~/.ssh/id_rsa.pub>> ~/.ssh/authorized_keys

6. Login to SSH, confirm that you do not need to enter the password

SSH localhost


7. Log Out
Exit

If you log in again, you don't need a password.

Four, install Hadoop

1. First to https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/download hadoop-2.7.2.tar.gz

2. Unpack and place in the directory you want. I put it in the/usr/local/hadoop.

$ sudo tar xzf hadoop-2.7.2.tar.gz
$ sudo mv Hadoop-2.7.2/usr/local/hadoop

3. To ensure that all operations are done under user hdsuer:
$ sudo chown-r hduser:hadoop/usr/local/hadoop

v. Configuration of ~/.BASHRC

1. Switch to Hadoop user, mine is HDUser

$ su-hduser

2. View the Java installation path
Update-alternatives--config Java


The complete path is:/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java
We'll just take the front part/USR/LIB/JVM/JAVA-7-OPENJDK-AMD64

3. Modify configuration file BASHRC

$ sudo gedit ~/.BASHRC

#在文件末尾追加下面内容


#HADOOP VARIABLES START
Export JAVA_HOME=/USR/LIB/JVM/JAVA-7-OPENJDK-AMD64
Export Hadoop_install=/usr/local/hadoop
Export path= $PATH: $HADOOP _install/bin
Export path= $PATH: $HADOOP _install/sbin
Export Hadoop_mapred_home= $HADOOP _install
Export Hadoop_common_home= $HADOOP _install
Export Hadoop_hdfs_home= $HADOOP _install
Export Yarn_home= $HADOOP _install
Export hadoop_common_lib_native_dir= $HADOOP _install/lib/native
Export hadoop_opts= "-djava.library.path= $HADOOP _install/lib"
#HADOOP VARIABLES End

4. Modify/usr/local/hadoop/etc/hadoop/hadoop-env.sh

$ sudo gedit/usr/local/hadoop/etc/hadoop/hadoop-env.sh

Find the Java_home variable, modify this variable as follows
Export JAVA_HOME=/USR/LIB/JVM/JAVA-7-OPENJDK-AMD64
at this point, the stand-alone mode configuration is complete , the following wordcount test

Six, WordCount test

1. First new folder input in the Hadoop directory

$ cd/usr/local/hadoop/
$ mkdir Input

2. Copy the README.txt file to the input folder to count the frequency of the words in the file
$ sudo cp README.txt input

3. Run the WordCount program, and save the output in a print folder

#每次重新执行wordcount程序的时候, you need to delete the output folder first. Otherwise there will be an error .

$ bin/hadoop Jar Share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.2-sources.jar Org.apache.hadoop.examples.WordCount Input Output


4. View character Statistics results
$ cat output/*


VII. pseudo Distribution mode configuration

1. Modify 2 profiles Core-site.xml and Hdfs-site.xml, configuration files are located in/usr/local/hadoop/etc/hadoop/

Start by creating several folders in the Hadoop directory:

$ cd/usr/local/hadoop
$ mkdir tmp
$ mkdir Tmp/dfs
$ mkdir Tmp/dfs/data
$ mkdir Tmp/dfs/name

Modify Core-site.xml:

$ sudo gedit etc/hadoop/core-site.xml

Modify to the following configuration:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>abase for the other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Modify Hdfs-site.xml:

$ sudo gedit etc/hadoop/hdfs-site.xml

Modify to the following configuration:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>

2. Perform namenode formatting

./bin/hdfs Namenode-format

attention. You only need to format the Hadoop cluster when you first create it, you cannot format a running Hadoop file system (HDFS), or you will lose data ...

successful, you will see the "successfully formatted" and "Exitting with status 0" prompts , if "Exitting with status 1" is an error.


3. Start Hadoop

Perform start-all.sh to start all services, including Namenode,datanode.

$ start-all.sh

Here, if there's error:cannot find configuration directory:/etc/hadoop, the following method resolves :

Configure a Hadoop configuration file directory in hadoop-env.sh

$ sudo gedit etc/hadoop/hadoop-env.sh

Plus export Hadoop_conf_dir=/usr/local/hadoop/etc/hadoop

After the modification, the following figure:


$ source Etc/hadoop/hadoop-env.sh

It's good to start all the services again.

$ start-all.sh


The following WARN prompts may appear at startup: WARN util. nativecodeloader:unable to load Native-hadoop the library for your platform ... using Builtin-java classes where applicable. the WARN hint can be ignored and does not affect normal use

4. To determine whether a successful startup is initiated through the JPS command:


After this happens, search the computer for JPS, because my Java installation path is:/opt/jdk1.8.0_91, so JPS is located at:/opt/jdk1.8.0_91/bin

$ cd/opt/jdk1.8.0_91/bin

$./jps

Successful startup will list the following processes: "Namenode", "Datanode" and "Secondarynamenode"


5. View HDFs information through the Web interface

Go to http://localhost:50070/to view

if http://localhost:50070/cannot be loaded, it may be resolved in the following way:

To perform namenode formatting first

$./bin/hdfs Namenode-format

Be sure to enter capital Y when prompted to enter y/n ...

Be sure to enter capital Y when prompted to enter y/n ...

Be sure to enter capital Y when prompted to enter y/n ...

Then execute start-all.sh to start all services

$ start-all.sh

And then execute the JPS command.

$ cd/opt/jdk1.8.0_91/bin

$./jps again to the URL http://localhost:50070/, you can load the normal.

6. Stop running Hadoop

$ stop-all.sh

A prompt for no datanode to stop appears:


Workaround:

After stop-all.sh, delete all content under/tmp/dfs/data and/tmp/dfs/name, as shown in the following illustration, which contains a current folder:

So just delete the current folder



After the deletion, format the Namenode again, start all the service start-all.sh, and stop the stop-all.sh, you can normally stop datanode.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.