Apache Hadoop and Hadoop biosphere _ distributed computing

Source: Internet
Author: User
Tags data structures

Apache Hadoop and Hadoop biosphere


Hadoop is a distributed system infrastructure developed by the Apache Foundation.

Users can develop distributed programs without knowing the underlying details of the distribution. Make full use of the power of the cluster for high-speed operation and storage.


Hadoop implements a distributed filesystem (Hadoop Distributed File System), referred to as HDFs. HDFs is characterized by high fault tolerance and is designed to be deployed on inexpensive (low-cost) hardware, and it provides high throughput (throughput) to access application data for applications with large datasets (large data set). HDFs relaxes the (relax) POSIX requirement to access the data in the file system (streaming access) in the form of streaming.

The most central design of Hadoop's framework is: HDFs and MapReduce. HDFS provides storage for massive amounts of data, MapReduce provides calculations for massive amounts of data.

Although Hadoop is known for its mapreduce and its distributed file System HDFs, the name Hadoop is also used for a group of related projects that use this platform for distributed computing and mass data processing.


Hadoop Common:

A set of distributed file systems and common I/O components and Interfaces (serialization, Java RPC, and persistent data structures)

Hdfs:hadoop distributed FileSystem (Distributed File System)-HDFS (Hadoop Distributed File systems), running on large commercial machine clusters


Mapreduce:

Distributed data processing model and execution environment, running in large commercial machine cluster


HBase:

A distributed, storage-by-column database. HBase uses HDFs as the underlying storage, while supporting MapReduce batch computations and point queries (random reads).


Hive: Data warehousing tools, made by Facebook. A distributed, storage-by-column data Warehouse. Hive manages the data stored in HDFS and provides a SQL based query language (with Run-time engines translated into mapreduce jobs) for querying data.


Zookeeper: A distributed lock facility that provides functionality similar to Google Chubby, made by Facebook. A distributed, high availability coordination service. Provides basic services such as distributed locks for building distributed applications.


Avro: A serialization system that supports efficient, cross-language RPC, and persistent storage of data. The new data serialization format and transmission tool will gradually replace the existing IPC mechanism of Hadoop.


Pig:

Large data analysis platform to provide users with a variety of interfaces. A data flow language and a running environment for retrieving very large datasets. Pig runs on the cluster of MapReduce and HDFs.


Ambari:

A Hadoop management tool that enables fast monitoring, deployment, and management of clusters.


Sqoop:

A tool for efficiently transferring data between databases and HDFs.

Reference documents:

http://baike.baidu.com/link?url=5TXA32tcYO3i-xO4cIMNT4b6EJv9rNo-2hO7L5FpZsEzeSHMh_BXS8d9yX4T80El7rGMUMMCgVRVfx-8a-Dl2q

http://hadoop.apache.org

"The Hadoop Authority Guide"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.