hadoop--related components and their relationships

Source: Internet
Author: User
Tags zookeeper hadoop mapreduce hadoop ecosystem

now Apache Hadoop has become the driving force behind the big data industry's development. Technologies such as hive and pig are often mentioned, but they all have functions and why they need strange names (such as Oozie,zookeeper, Flume).

Hadoop brings the ability to deal with big data cheaply (big data is usually 10-100GB or more, and there are a variety of data types, including structured, unstructured, etc.). But how is this different from before?

today's enterprise data warehouses and relational databases are adept at processing structured data and can store large amounts of data. But the cost is somewhat expensive. This requirement for data limits the kinds of data that can be processed, and the drawbacks of this inertia also affect how the Data warehouse explores agility when confronted with massive amounts of heterogeneous data. This usually means that valuable data sources have never been excavated within the organization. This is the biggest difference between Hadoop and traditional data processing methods.

This article describes the components of the Hadoop system and explains the capabilities of each component.

The Hadoop ecosystem contains more than 10 components or sub-projects, but there are challenges in terms of installation, configuration, deployment of cluster size, and management.

The Hadoop main components include:

Hadoop: A software framework written in Java to support data-intensive distributed Applications

ZooKeeper: Highly reliable distributed coordination system

MapReduce: A flexible Parallel data processing framework for big data

HDFS: Hadoop Distributed File System

Oozie: Responsible for MapReduce job scheduling

HBase: Key-value database

Hive: A data Warehouse package built on top of maprudece

pig: Pig is an advanced data processing layer that is architected on top of Hadoop. The Pig Latin language provides programmers with a more intuitive way to customize data streams.

the application and typical characteristics of the Hadoop MapReduce method
    • Huge amount of data
    • less or no data dependency
    • contains both structured and unstructured data
    • suitable for large-scale parallel processing
Application Use Cases
    • fast enough batch analyzer to meet business needs and business reports, such as website traffic and product recommendation analysis.
    • iterative analysis using data mining and machine learning algorithms. such as association rules analysis K-means Data aggregation, link analysis (data analysis Technology), data mining classification, the famous Bayes algorithm analysis.
    • statistical analysis and refinement, such as Web log analysis, data analysis
    • behavioral analysis, such as clickstream analysis, user video behavior, etc.
    • transformations and enhancements such as social media, ETL processing, data normalization , and more

Typically, Hadoop is applied to distributed environments. As in previous Linux, vendors integrate and test components of the Apache Hadoop ecosystem and add their own tools and management capabilities.

hadoop--related components and their relationships

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.