Things about Hadoop

Source: Internet
Author: User
Keywords hadoop hadoop architecture hadoop big data
Preface
What is Hadoop?
In the words of the encyclopedia: "Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of distributed. Make full use of the power of clusters for high-speed computing and storage. ."
It may be a little abstract, this problem can be revisited after learning various knowledge of Hadoop step by step.

Hadoop family
Hadoop is not a single project. After 10 years of development, Hadoop has become a huge family with nearly 20 products.
The core of them includes the following 9 products, and we will learn step by step in the following order.

Hadoop: is a distributed computing open source framework of the Apache open source organization, providing a distributed file system sub-project (HDFS) and a software architecture that supports MapReduce distributed computing
Hive: A data warehouse tool based on Hadoop
Pig: a large-scale data analysis tool based on Hadoop
Zookeeper: is a distributed, open source coordination service designed for distributed applications. It is mainly used to solve some data management problems often encountered in distributed applications, simplifying the difficulty of distributed application coordination and management, and providing High-performance distributed services
HBase: It is a highly reliable, high-performance, column-oriented, and scalable distributed storage system. Using HBase technology, a large-scale structured storage cluster can be built on a cheap PC Server.
Mahout: A distributed framework for machine learning and data mining based on Hadoop
Sqoop: It is a tool used to transfer data between Hadoop and relational databases. It can import data from a relational database (MySQL, Oracle, Postgres, etc.) into HDFS of Hadoop, or transfer data from HDFS. The data is imported into a relational database.
Cassandra:: is an open source distributed NoSQL database system
Flume: is a distributed, reliable, and highly available system for massive log aggregation, which can be used for log data collection, log data processing, and log data transmission.

Ok, let's start learning Hadoop officially.

1 Environment setup
There are three ways to install Hadoop:

Stand-alone mode: simple installation, almost not used for any configuration, but only for debugging purposes;
Pseudo-distribution mode: 5 processes including NameNode, DataNode, JobTracker, TaskTracker, and Secondary Namenode are simultaneously started on a single node to simulate distributed operation of each node;
Fully distributed mode: a normal Hadoop cluster consists of multiple nodes performing their duties

Next, we build a pseudo-distributed environment.

Operating system: Ubuntu 16.04 LTS
JDK: JDK1.8
Hadoop: 2.6.0

S1 Create hadoop user
First switch to root user

su root

Create users, set passwords, and assign administrator rights to them

useradd -m hadoop -s /bin/bash
passwd hadoop
adduser hadoop sudo

After the creation is successful, log out the current user and log in again as the hadoop user.

S2 update apt
Later, you need to use apt to install the software. Here, update apt first and execute the following command:

sudo apt-get update

S3 install vim
Install vim for modifying files

sudo apt-get install vim

S4 install SSH
Install SSH for remote login control.
ubuntu installs SSH client by default, we need to install SSH server, use the following command:

sudo apt-get install openssh-server

Use the following command to log in to this machine

ssh localhost
S5 install JDK
First download the jdk1.8 installation package on the official website, what I downloaded is:
jdk-8u111-linux-x64.tar.gz
Create a new jvm folder in the usr/lib directory and authorize:

cd /user/lib
sudo mkdir jvm
sudo chown hadoop ./jvm
sudo chmod -R 777 ./jvm

Copy the downloaded installation package to the jvm directory.
In the jvm directory, execute the following command to decompress and rename

sudo tar zxvf jdk-8u111-linux-x64.tar.gz
sudo mv jdk1.8.0_111 java

Continue to execute the following command in the jvm directory to enter the vim editor, press "I" to enter the editing mode,

vim ~/.bashrc

Move the cursor to the front, enter the following content (you must write to the beginning, write at the end, and then there will be a situation where java_home cannot be found):


export JAVA_HOME=/usr/lib/jvm/java
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib;${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

First press the ESC key, then press shift+zz to save and exit, return to the command line mode and enter the following command to make the modification effective

source ~/.bashrc
1
This configuration is complete, you can use the following command to verify whether the installation is successful.

echo ¥JAVA_HOME
java -version
$JAVA_HOME/bin/java -version

S6 install Hadoop2
Download hadoop-2.6.0.tar.gz and hadoop-2.6.0.tar.gz.mds at http://mirror.bit.edu.cn/apache/hadoop/common/ page

Create a new hadoop file and authorize

cd /usr/local
sudo mkdir hadoop
sudo chown hadoop ./hadoop
sudo chmod -R 777 ./hadoop

Copy the downloaded file to this directory
Execute the following command (I don't know why it is used)

cat /usr/local/hadoop/hadoop-2.6.0.tar.gz.mds|grep'MD5'
md5sum /usr/local/hadoop/hadoop-2.6.0.tar.gz|tr "a-z" "A-Z"

Switch to the /usr/local/hadoop directory, unzip and rename:

cd /usr/local/hadoop
sudo tar -zxf hadoop-2.6.0.tar.gz
sudo mv hadoop-2.6.0 hadoop2

Hadoop can be used after decompression, use the following command to verify:

cd /usr/local/hadoop/hadoop2
./bin/hadoop version

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.