International - English

Cart Console

Topic Center

Contact Sales

Home > Internet > Online Trends

hadoop--Big Data tools you should know

Last Update:2015-03-17 Source: Internet

Author: User

Keywords Large data you should data Warehouse

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Now Apache Hadoop has become the driving force behind the development of the big data industry. Techniques such as hive and pig are often mentioned, but they all have functions and why they need strange names (such as Oozie,zookeeper, Flume).

Hadoop has brought in cheap processing of large data (large data volumes are usually 10-100GB or more, with a variety of data types, including structured, unstructured, etc.) capabilities. But what's the difference?

Today enterprise data warehouses and relational databases are good at handling structured data and can store large amounts of data. But the cost is somewhat expensive. This requirement for data limits the types of data that can be processed, and the drawbacks of this inertia affect the search for agility in data warehouses when confronted with massive amounts of heterogeneous data. This usually means that valuable data sources are never mined within the organization. This is the biggest difference between Hadoop and traditional data processing methods.

This article describes the components of the Hadoop system and explains the functionality of each component.

The Hadoop ecosystem contains more than 10 components or subprojects, but it faces challenges in terms of installation, configuration, and deployment of cluster size and management.

Hadoop main components include:

Software framework written by Hadoop:java to support data-intensive distributed applications

Zookeeper: High reliability distributed Coordination system

MapReduce: Flexible Parallel data processing framework for large data

Hdfs:hadoop Distributed File System

Oozie: Responsible for MapReduce job scheduling

Hbase:key-value Database

Hive: A data Warehouse package built on Maprudece

Pig:pig is a high-level data processing layer that is architected over Hadoop. The Pig correlation language provides programmers with a more intuitive way of customizing data flow.

The application of the Hadoop MapReduce method and the typical characteristics of a large number of small or no data dependencies containing structured and unstructured data suitable for large-scale parallel processing applications use cases fast enough batch analyzer to meet business requirements and business reports, such as site traffic and product recommendation analysis. Iterative analysis using data mining and machine learning algorithms. such as association rule Analysis K Data aggregation, link analysis (data analysis Technology), data mining classification, famous Bayes algorithm analysis. Statistical analysis and refinement, such as Web log analysis, data analysis behavior analysis, such as click Flow Analysis, user video behavior, such as conversion and enhancement functions, such as social media, ETL processing, data standardization, etc.

Typically, Hadoop is applied to a distributed environment. As in previous Linux situations, vendors integrated and tested the components of the Apache Hadoop ecosystem and added their own tools and management capabilities.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

big data tools hadoop all you need to know about big data cloud computing terms you should know should i learn big data things programmers should know what does big data know about me hadoop data mining tools

Getting Started with CDN 12-02

Front-end Must Learn: CDN Acceleration Principle 12-02

Elements of CDN Network 12-01

Understand the Principle of CDN Acceleration in One Article 12-01

Cloud Security Issues Derived from the Development of Cloud C... 11-26

8 New Types of Attacks Facing the Cloud Environment 11-26

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Hot Article

Hot Tags

computing conference access forum computer class data get http html applications

Popular Keywords

html add blank space register business logo register ssl certificate full site sign in sign up node js build cloud register register a subdomain in python network management system tutorial how to learn computer science by myself

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

hadoop--Big Data tools you should know

Contact Us

Hot Article

Hot Tags

Popular Keywords

Recommend Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support