Members of the Hadoop family

Last Update:2015-01-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Overview

Hadoop has been used for some time, from the beginning of ignorance to confusion, and then to a variety of reading and writing, and then to the various combinations of applications, has gradually been inseparable from the Hadoop, Hadoop in the big data industry success, accelerate its own development, the major communities can see the figure of Hadoop. Hadoop now has more than 20 family members.

Therefore, it is necessary to make regular collation of the knowledge that I have mastered. Combining all the tools and technologies will not only deepen their impressions, but also be helpful for future development.

2. Introduction of Members

Here's a look at the various members of the Hadoop family and the responsibilities they belong to.

Apache Hadoop: A distributed computing open source framework for the Apache Open source organization that provides a distributed File system subproject (HDFS) and a software architecture that supports mapreduce distributed computing.

Apache Hive: A Hadoop-based data warehousing tool that maps structured data files into a database table, quickly implements simple mapreduce statistics with class-SQL statements, and eliminates the need to develop specialized mapreduce applications. It is very suitable for statistical analysis of Data Warehouse.

Apache Pig: An HDFs-based, large-scale data analysis tool that provides the Sql-like language called Pig Latin, which translates SQL-like data analysis requests into a series of optimized mapreduce operations.

Apache hbase: is a highly reliable, high-performance, column-oriented, scalable distributed storage system that leverages HBase technology to build large, structured storage clusters on inexpensive PC servers.

Apache Sqoop: is a tool used to transfer data from HDFs and relational databases to each other in a relational database (MySQL, Oracle, Postgres, etc.) into the HDFs of Hadoop. HDFs data can also be directed into a relational database.

Apache Zookeeper: is a distributed, open source coordination service designed for distributing applications, it is mainly used to solve some data management problems frequently encountered in distributed applications, simplify the coordination and management of distributed applications, and provide high-performance distributed services.

Apache Mahout: A distributed framework for machine learning and data mining based on Hadoop. Mahout implements some data mining algorithms with MapReduce, and solves the problem of parallel mining.

Apache Cassandra: is a set of open source distributed NoSQL database system. It was originally developed by Facebook to store simple format data, a data model for Google BigTable and a fully distributed architecture of Amazon Dynamo

Apache Avro: is a data serialization system designed to support data-intensive, large-volume data exchange applications. Avro is the new data serialization format and transfer tool that will progressively replace the original IPC mechanism of Hadoop.

Apache Ambari: is a web-based tool that supports the provisioning, management, and monitoring of Hadoop clusters.

Apache Chukwa: is an open source data collection system for monitoring large distributed systems that can collect all kinds of data into Hadoop-ready files to be stored in HDFS for various MapReduce operations in Hadoop.

Apache Hama: is an HDFs-based BSP (Bulk synchronous Parallel) Parallel computing framework Hama can be used for large-scale, big data calculations including graphs, matrices, and network algorithms.

Apache Flume: is a distributed, reliable, high-availability system of large-volume log aggregation, which can be used for log data collection, log processing, and log transfer.

Apache giraph: is a scalable distributed iterative processing system based on the Hadoop platform, inspired by the BSP (bulk synchronous parallel) and Google Pregel.

Apache Oozie: is a workflow engine server that manages and coordinates the tasks that run on the Hadoop platform (HDFS, pig, and MapReduce).

Apache Crunch: Is a Java library written based on Google's Flumejava library for creating MapReduce programs. Similar to Hive,pig, Crunch provides a library of patterns for common tasks such as connecting data, performing aggregations, and sorting records.

Apache whirr: A class library that runs on cloud services, including Hadoop, to provide a high degree of complementarity. WHIRR supports the services of Amazon EC2 and Rackspace.

Apache bigtop: A tool for packaging, distributing, and testing Hadoop and its surrounding ecosystems.

Apache hcatalog: Hadoop-based data table and storage management for central metadata and schema management, spanning Hadoop and RDBMS, and providing relational views with pig and hive.

Cloudera Hue: A web-based monitoring and management system that implements Web operations and management of Hdfs,mapreduce/yarn, HBase, Hive, Pig.

Members of the Hadoop family

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Members of the Hadoop family

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Members of the Hadoop family

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support