Hadoop installation & Standalone/pseudo-distributed configuration

Hadoop installation & Standalone/pseudo-distributed configuration _hadoop2.7.2/ubuntu14.04

Last Update:2016-05-06 Source: Internet

Author: User

Tags gz file

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, install Java

1. Download the jdk-8u91-linux-x64.tar.gz file at:http://www.oracle.com/technetwork/java/javase/downloads/index.html

2. Installation:

#选择一个安装路径, I chose/opt and copied the downloaded jdk-8u91-linux-x64.tar.gz file to this folder

$ cd/opt

$ sudo cp ~/downloads/jdk-8u91-linux-x64.tar.gz-i/opt/

#解压, installation

$ sudo tar zxvf jdk-8u91-linux-x64.tar.gz

$ sudo rm-r jdk-8u91-linux-x64.tar.gz
#检查是否安装成功

Ii. creating Hadoop groups and Hadoop users

1. Adding a Hadoop user to a system user
$ sudo addgroup Hadoop
$ sudo adduser--ingroup Hadoop hduser
2. Give the Hadoop user RS

Add hduser all= (all:all) all under root all= (All:all) all

Such as:

Third, configure SSH

So that each machine executes instructions without entering a login password, the master node will need to manually enter this password each time it attempts to access another node.

1. Install SSH
$ sudo apt-get install Openssh-server

2. Start the service

$ sudo/etc/init.d/ssh Start

3. After booting, you can see if the service starts correctly by following the command

$ ps-e |grep SSH

4. Generate the public and private keys:

$ ssh-keygen-y-T Rsa-p ""

Two files are generated under/home/hduser/.ssh: Id_rsa and Id_rsa.pub, which is the private key and the latter is the public key.

5. Now we append the public key to the Authorized_keys
$ cat ~/.ssh/id_rsa.pub>> ~/.ssh/authorized_keys

6. Log in to SSH and confirm that you don't need to enter a password

SSH localhost

7. Log Out
Exit

If you log in again, you don't need a password.

Iv. installation of Hadoop

1. First download to https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/ Hadoop-2.7.2.tar.gz

2. Unzip and place in the directory you want. I put it in the/usr/local/hadoop.

$ sudo tar xzf hadoop-2.7.2.tar.gz
$ sudo mv Hadoop-2.7.2/usr/local/hadoop

3. To ensure that all operations are done under user hdsuer:
$ sudo chown-r hduser:hadoop/usr/local/hadoop

V. Configuration ~/.BASHRC

1. Switch to Hadoop user, mine is HDUser

$ su-hduser

2. View the Java installation path
Update-alternatives--config Java

The complete path is:/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java
We only take the previous part/USR/LIB/JVM/JAVA-7-OPENJDK-AMD64

3. Modify the configuration file BASHRC

$ sudo gedit ~/.BASHRC

#在文件末尾追加下面内容

#HADOOP VARIABLES START
Export JAVA_HOME=/USR/LIB/JVM/JAVA-7-OPENJDK-AMD64
Export Hadoop_install=/usr/local/hadoop
Export path= $PATH: $HADOOP _install/bin
Export path= $PATH: $HADOOP _install/sbin
Export Hadoop_mapred_home= $HADOOP _install
Export Hadoop_common_home= $HADOOP _install
Export Hadoop_hdfs_home= $HADOOP _install
Export Yarn_home= $HADOOP _install
Export hadoop_common_lib_native_dir= $HADOOP _install/lib/native
Export hadoop_opts= "-djava.library.path= $HADOOP _install/lib"
#HADOOP VARIABLES END

4. Modify/usr/local/hadoop/etc/hadoop/hadoop-env.sh

$ sudo gedit/usr/local/hadoop/etc/hadoop/hadoop-env.sh

Locate the Java_home variable and modify the variable as follows
Export JAVA_HOME=/USR/LIB/JVM/JAVA-7-OPENJDK-AMD64
at this point, the standalone mode configuration is complete , WordCount test is performed below

VI. WORDCOUNT Test

1. First create a new folder in the Hadoop directory input

$ cd/usr/local/hadoop/
$ mkdir Input

2. Copy the README.txt file to the input folder to count the frequency of the words in the file
$ sudo cp README.txt input

3. Run the WordCount program, and save the output in the Outputs folder

#每次重新执行wordcount程序的时候, you need to delete the output folder first! Otherwise, it will go wrong

$ bin/hadoop Jar Share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.7.2-sources.jar Org.apache.hadoop.examples.WordCount Input Output

4. View character Statistics results
$ cat output/*

VII. pseudo Distribution Mode configuration

1. Modify 2 configuration Files Core-site.xml and Hdfs-site.xml, the configuration file is located in/usr/local/hadoop/etc/hadoop/

Start by creating several folders in the Hadoop directory:

$ cd/usr/local/hadoop
$ mkdir tmp
$ mkdir Tmp/dfs
$ mkdir Tmp/dfs/data
$ mkdir Tmp/dfs/name

Modify Core-site.xml:

$ sudo gedit etc/hadoop/core-site.xml

modified to the following configuration:
<configuration>
<property>
&NBSP;<NAME>HADOOP.TMP.DIR</NAME>
& Nbsp;<value>file:/usr/local/hadoop/tmp</value>
< Description>abase for other temporary directories.</description>
</property
<property>
<name> Fs.defaultfs</name>
<value>hdfs://localhost:9000</ Value>
</property>
</configuration>

Modify hdfs-site.xml:

$ sudo gedit etc/hadoop/hdfs-site.xml

Modify to the following configuration:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>

2. Format of the execution Namenode

./bin/hdfs Namenode-format

Attention! You only need to format the Hadoop cluster when you just created it, and you can't format a running Hadoop file system (HDFS) , or you'll lose data!!

successful, you will see "successfully formatted" and "Exitting with status 0" prompt , if "Exitting with status 1" is an error.

3. Start Hadoop

Execute start-all.sh to start all services, including Namenode,datanode.

$ start-all.sh

Here, if error:cannot find configuration directory:/etc/hadoop appears, it is resolved by the following method :

Configure a directory for Hadoop configuration files in hadoop-env.sh

$ sudo gedit etc/hadoop/hadoop-env.sh

Plus export Hadoop_conf_dir=/usr/local/hadoop/etc/hadoop

After modifications such as:

$ source Etc/hadoop/hadoop-env.sh

Just start all the services again.

$ start-all.sh

The following WARN prompt may appear at startup: WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable. the WARN hint can be ignored , and will not affect normal use

4. Use the JPS command to determine if the startup is successful:

After this occurs, search the computer for JPS, because my Java installation path is:/opt/jdk1.8.0_91, so JPS is located in:/opt/jdk1.8.0_91/bin

$ cd/opt/jdk1.8.0_91/bin

$./jps

If successful, the following processes are listed: "NameNode", "DataNode", and "Secondarynamenode"

5. View HDFs information through the Web interface

Go to http://localhost:50070/to view

If the http://localhost:50070/cannot be loaded, it may be resolved by the following method:

First formatting of the execution Namenode

$./bin/hdfs Namenode-format

When prompted to enter y/n, be sure to enter the upper case y!!!

again execute start-all.sh to start all services

$ start-all.sh

Then execute the JPS command

$ cd/opt/jdk1.8.0_91/bin

$./jps

go to URL http://localhost:50070/again and it will load normally.

6. Stop running Hadoop

$ stop-all.sh

The prompt for no datanode to stop appears:

Workaround:

After stop-all.sh, delete all content under/tmp/dfs/data and/tmp/dfs/name, as shown, with a current folder:

So just delete the current folder

After deletion, the Namenode, start all services start-all.sh, and stop stop-all.sh, you can normal stop datanode.

Hadoop installation & Standalone/pseudo-distributed configuration _hadoop2.7.2/ubuntu14.04

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More