Members of the hadoop family

Source: Internet
Author: User

This article does not mention the principles and describes the functions of hadoop and its surrounding projects.

The word hadoop has been popular for many years. When it comes to big data, you will think of hadoop. What is the role of hadoop?

Official definition: hadoop is a software platform for developing and running large-scale data processing. The core term is the platform. That is to say, we have a large amount of data and several computers. We know that the data processing tasks should be divided into various computers, but we do not know how to assign tasks, hadoop has probably helped us to collect the results.

1HDFS we should first consider how to save and manage massive data. This involves the Distributed File System, HDFS. After 2Map-Reduce data is saved, how can we process the data? What should I do if the processing method is complicated, not just sorting? There must be a place where code writing can be provided, so that we can write the operation by ourselves, and it can be further decomposed, allocated, and recycled internally. It is good for 3Hive to compile code, but it is too troublesome to compile the code. Database staff are familiar with SQL statements. If they can use SQL statements for processing, they don't need Map-Reduce, so Hive emerged. In addition, big data is inseparable from databases and tables. Hive can map data into data tables and then perform operations conveniently. Its disadvantage is that the speed is slow. 4HBase since Hive is slow, is there a fast database? HBase is generated for query and the query speed is very fast. In the past, not many well-known databases such as MySQL and Oracle existed in 5Sqoop. How can I import them to HDFS? Sqoop provides mutual conversion between relational databases and HDFS. 6Flume works on so many computers. If one of them has a problem, or which of the above services has a problem, how can we know what is going wrong? Flume provides a highly reliable log collection system. 7 mahout is used to process big data for data mining. There are several common machine learning algorithms. Since these algorithms are fixed, then, developers can develop things called Mahout to implement various algorithms, so that developers can use them more quickly. The goal of 8ZookeeperZooKeeper is to encapsulate key services that are complex and error-prone, and provide users with easy-to-use interfaces and systems with efficient performance and stable functions. To put it bluntly, it is the zoo administrator who manages elephants (Hadoop) and bees (Hive.

The above are the main members of the Hadoop family. There are a few other users who do not need to introduce them. After learning about the roles of these Members, they will have a preliminary understanding of what Hadoop can do as a whole, the rest is to slowly learn the principles and usage of each part.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.