Hadoop Installation Full Tutorial Ubuntu16.04+java1.8.0+hadoop2.7.3__java

Source: Internet
Author: User
Tags tmp folder ssh server

2017/6/21 Update after installation, create the logs folder under the/usr/local/hadoop/hadoop-2.7.3 path and change the permissions to 777

9-26 Important updates: All the commands in this article are from the real machine copy, may be in the process of pasting copy of the unknown error, so please manually enter the command, thank you.

Recently listened to a big data on Daniel's experience sharing, in the end of the sharing Daniel gave us a he had written about large data and geographical application demo. This demo needs to build a Hadoop platform on the Linux environment. This time I will simply share my information about
Some of the experience and problems encountered in building the Hadoop platform on Linux virtual machines and the solutions to the problems.

First of all, the environment we built this time is Hadoop. Hadoop implements a distributed file system that can be deployed in inexpensive hardware environments and provides high throughput to access application data, and is ideal for applications with large datasets. And most importantly, Hadoop is open source.

This time we are going to install our Hadoop lab environment on a single computer (virtual machine). If you have not yet installed the virtual machine, please check out the VMware Workstations Pro 12 installation tutorial. If you have not installed the Linux operating system in the virtual machine, please install the Ubuntu or CentOS tutorial under VMware.

The installed mode is stand-alone mode and pseudo distribution mode. Stand-alone mode is the most streamlined mode that is selected by default after Hadoop decompression, in which the configuration information in Core-site.xml, Hdfs-site.xml, and hadoop-env.sh is empty by default and is required for installation. Pseudo-distribution mode is that Hadoop runs on a single cluster, which has more code debugging than stand-alone mode, and enables the HDFs feature and can interact with several daemons

The Ubuntu16.04lts+java 1.8.0_101+hadoop 2.7.3 is installed in this article

Installation of Java environment on Linux

The first thing we need to know before installing Hadoop on Linux is that Hadoop is a program based on Java development. So we need to make sure that we have a Java environment on Linux before we install Hadoop. Here's how to install Java1.8.0_101 on Linux.

Before installing Java, we need to check that there is no Java installed in the system, use the Java-version command to see if Java is installed, and install java1.8.0 after uninstalling if other versions of Java are installed.

First we need to download the JDK installation package we need on the Oracle Web site, and the JDK is Java Development Kit, which contains the operating environment necessary for Java to run. You can download this web site to the java1.8.0_101 Linux version of the installation package, note that when downloading the installation package, please select the appropriate version of your operating system to download the number of files (the operating system refers to the Linux version of your virtual machine installed, You can view your current Linux version through the UNAME-A directive.

Www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

After the download is complete, we will get a compressed package with a suffix named. tar.gz, at which point we unzip the file to the/usr/java/directory (please create the new directory before extracting it)

TAR-ZXVF jdk-8u101-linux-x64.tar.gz-c/usr/java/

After the decompression, we can configure our environment variables.

Vim ~/.BASHRC

#写入环境变量

export java_home=/usr/java/jdk1.8.0_101
export classpath=.: $JAVA _home/lib/ Dt.jar: $JAVA _home/lib/tools.jar
export path= $JAVA _home/bin: $PATH

Use after you finish writing environment variables

SOURCE ~/.BASHRC

Make environment variables effective

After the configuration is done, we use

Java-version 

To see if the Java installation is complete.

Second, install SSH server to achieve password-free login

Because Hadoop needs to communicate using SSH, we need to install SSH on our operating system. Before installing, we need to see if the system is installed and the SSH is started

#查看ssh安装包情况
dpkg-l | grep ssh  

#查看是否启动ssh服务
ps-e | grep ssh

If there is no SSH service in the system, you can use the

sudo apt-get install Openssh-server 

To install the SSH service, which is used after installation

Sudo/etc/init.d/ssh start   

Open the service.

and then use

Ps-e | grep ssh

To see if the service starts.

SSH as a secure communication protocol, naturally need to enter the password when the communication, but because we pseudo distribution mode, so we will set the password-free login.

#生成秘钥
ssh-keygen-t dsa-p '-F ~/.ssh/id_dsa
#导入authorized_keys
cat ~/.ssh/id_dsa.pub >> ~/.ssh/ Authorized_keys

#测试是否免密码登录localhost
ssh localhost

Shut down the machine's firewall after the end
UFW Disable

Third, the installation of Hadoop

We can start to install our Hadoop after we finish our preliminary work.

Download Hadoop below provides a download link for Hadoop

Http://hadoop.apache.org/releases.html

Download binary

Extract the files after downloading

TAR-ZXVF hadoop-2.7.3-c/usr/local/hadoop/

Create a/usr/local/hadoop/directory between uncompressed

Here are the configuration files Core-site.xml, Hdfs-site.xml, hadoop-env.sh three files

All three files are under/usr/local/hadoop/hadoop-2.7.3/etc/hadoop/and are written in and in the first two files

First file Core-site.xml

Core-site.xml
 <!--designated HDFS (Namenode) communication address-->
    <property>
        <name>fs.default.name </name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/windghoul/tmp</value>
    </ Property>

Note that the/home/windghoul/tmp folder is being replaced with the TMP folder in the computer's current user directory. Please create

Second file Hdfs-site.xml

Hdfs-site.xml
 <!--set HDFs number of replicas-->
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>

The following line is found in the third file hadoop-env.sh and then the content is written

# The Java implementation to use.
Export java_home=/usr/java/jdk1.8.0_101
export hadoop_home=/usr/local/hadoop/hadoop-2.7.3
export path=$ Path:/usr/local/hadoop/hadoop-2.7.3/bin

Next, write the environment variables for Hadoop in the system environment variable
Vim/etc/environment

#在文件的结尾 "" Plus
:/usr/local/hadoop/hadoop-2.7.3/bin
:/usr/local/hadoop/hadoop-2.7.3/sbin

Reboot system

Verify that the Hadoop stand-alone mode installation is complete

Hadoop version

Seeing the version number on the screen showing Hadoop indicates that stand-alone mode has been configured to complete

The next step is to start HDFs using pseudo distribution mode

The first step of formatting

Hadoop Namenode-format

Display the following: successful formatting

...
...
16/09/24 23:39:53 INFO Common. Storage:storage Directory/home/windghoul/tmp/dfs/name has been successfully formatted.
...
...
/************************************************************
shutdown_msg:shutting down NameNode at ubuntu/ 127.0.1.1
************************************************************/

Start HDFs

sbin/start-all.sh

Show process

JPs

Seeing the following on the screen shows that HDFs has succeeded

Stop HDFs

sbin/stop-all.sh

The above command line requires the system path under the Hadoop installation path and, if run under/home/username, enter the full path.

So our Hadoop environment is basically built, and then I'll write a few simple apps for Hadoop to share.

Problem solving

Q: I entered Hadoop version on the command line after I had configured the file and did not show the number of Hadoop

A: Please check the configuration of environment variables, especially if there are any environment variables written to Hadoop, check the/etc/environment
and reboot your computer.

Q: I don't have the right formatting in the format
A: If you are in question 1, check that the Hadoop stand-alone mode is properly installed and configured before formatting and check that the Core-site.xml file is properly configured

Q: Always remind me to enter the localhost password when I finally start HDFs
A: If the reminder input password may be the owner of the TMP folder is not right, use chmod-r a+w/home/windghoul/tmp may be resolved.

In the final thanks to the network of several previous versions of the installation tutorial

Http://www.aboutyun.com/thread-7684-1-1.html

Http://www.aboutyun.com/thread-6487-1-1.html

http://blog.csdn.net/uq_jin/article/details/51451995

http://blog.csdn.net/hitwengqi/article/details/8008203

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.