Apache Hadoop and the Hadoop ecosystem

Source: Internet
Author: User
Tags hadoop ecosystem

Apache Hadoop and the Hadoop Ecosystem


Hadoop is a distributed system infrastructure developed by the Apache Foundation .

The user is able to understand the distributed underlying details. Develop distributed programs. Take advantage of the power of the cluster for fast operations and storage.


Hadoop implements a distributed filesystem (Hadoop distributedFile system), referred to as HDFS.

HDFS Features high fault tolerance and is designed to be deployed on inexpensive (low-cost) hardware. And it provides high throughput (Hithroughput) to Access application data for applications with very large datasets (largedata set).

HDFS relaxed The ( Relax ) POSIX the requirements. The ability toaccessdata in the file system in the form of a stream (streaming.

The core design of the Hadoop framework is:HDFS and MapReduce.

HDFS provides storage for massive amounts of data. Then MapReduce provides calculations for massive amounts of data.

Although Hadoop is known for MapReduce and its distributed file system HDFS , Hadoop The name is also used collectively for a group of related projects, which use the underlying platform for distributed computing and massive data processing.


Hadoop Common:

A set of distributed file systems and general-purpose I/O Components and Interfaces (serialization,Java RPC , and persisted data structures)

Hdfs:hadoop Distributed File Systems (Distributed File System) - HDFS (Hadoop Distributed file). Implemented in large commercial machine clusters


Mapreduce:

Distributed data processing model and execution environment, implemented in large commercial machine cluster


HBase:

A distributed, column-based storage database. HBase Use HDFS as the underlying storage, support at the same time MapReduce batch-based calculations and point queries (random reads).


Hive: Data Warehouse tool. Contributed by Facebook . a distributed, column-stored Data Warehouse.

Hive manages The data stored in HDFS. and provides a SQL -based query language (with an execution-time engine translated into a MapReduce Job) to query the data.


Zookeeper: A distributed lock facility that provides Google Chubby -like features that are contributed by Facebook .

A distributed, high-availability coordination service. Basic services such as distributed locks are provided to build distributed applications.


Avro: A serialization system that supports efficient, cross-language RPC and permanent storage of data. the new data serialization format and Transfer tool will gradually replace the original IPC mechanism of Hadoop .


Pig:

Big Data analytics platform. Provides a variety of interfaces for users.

A data flow language and execution environment to retrieve a large set of data. Pig executes on a cluster of MapReduce and HDFS .


Ambari:

Hadoop management tools. Ability to quickly monitor, deploy, and manage clusters.


Sqoop:

A tool for efficient data transfer between the database and HDFS.

References:

http://baike.baidu.com/link?url=5TXA32tcYO3i-xO4cIMNT4b6EJv9rNo-2hO7L5FpZsEzeSHMh_BXS8d9yX4T80El7rGMUMMCgVRVfx-8a-Dl2q

http://hadoop.apache.org

TheHadoop authoritative guide

Apache Hadoop and the Hadoop ecosystem

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.