Hadoop Big Data Platform Build

Source: Internet
Author: User
Tags hadoop ecosystem

Basics: Linux Common commands, Java programming basics
Big Data: Scientific data, financial data, Internet of things data, traffic data, social network data, retail data, and more.

Hadoop: An open source distributed storage, distributed computing platform. (Apache-based)


The composition of Hadoop:
HDFS: A distributed File system that stores massive amounts of data.
MapReduce: Parallel processing framework for task decomposition and scheduling.

The use of Hadoop:

Build large-scale data warehouse, petabytes of data storage, processing, analysis, statistics and other services.

  such as search engines, Web pages of data processing, various business intelligence, risk assessment, early warning, there are some log analysis, data mining tasks.

Hadoop Benefits: High-scale, low-cost, mature biosphere (Hadoop Ecosystem Map)

Hadoop Open Source Tools:

Hive: Translates SQL statements into a Hadoop task to execute, reducing the threshold for using Hadoop.
HBase: A distributed database that stores structured data, Habase provides random read-write and real-time access to data, and reads and writes the table data.
Zookeeper: Like an animal administrator, monitor the state of each node within a Hadoop cluster, manage the configuration of the entire cluster, maintain data between the nodes and so on.

The version of Hadoop is as stable as possible, the older version.

===============================================

Installation and configuration of Hadoop:
1) Install the JDK in Linux and set the environment variables
Installing JDK: >> sudo apt-get install OPENJDK-7-JDK
Set Environment variables:

>> Vim/etc/profile

>>: Wq

2) Download Hadoop and set the Hadoop environment variable
Download Hadoop decompression:

>> CD/OPT/HADOOP-1.2.1/

>> ls

>> Vim/etc/profile

>>:wq


3) Modification of 4 configuration files
(a) Modify hadoop-env.sh, set Java_home
(b) Modification of core-site.xml, setting Hadoop.tmp.dir, Dfs.name.dir, Fs.default.name
(c) Modify Mapred-site.xml, set Mapred.job.tracker
(d) Modify the Hdfs-site.xml and set the Dfs.data.dir

>> CD conf
>> ls

>> Vim Mapred-site.xml

>>: Wq

>> Vim Core-site.xml

The first part

Part II


>>: Wq

>> Vim Hdfs-site.xml


>>: Wq

>> Vim hadoop-env.sh


>>: Wq

# Hadoop format
>> Hadoop Namenode-format
# Hadoop Startup
>> start-all.sh
# View the current running process with the JPS command
>> JPS
See the following process to indicate that the Hadoop installation was successful

Hadoop Big Data Platform Build

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.