The role of each member of the Hadoop family

Source: Internet
Author: User
Keywords Each provide we how

This article does not mention the principle of the role of Hadoop and its surrounding projects.

The word Hadoop has been in vogue for years, and when it comes to big data, you think of Hadoop, what is the role of Hadoop?

Official definition: Hadoop is a software platform for developing and running large-scale data processing. The core word is the platform, that is, 11545.html "> We have a lot of data, there are several computers, we know that the task of processing data should be broken down to each computer, but do not know how to assign tasks, how to recycle results, Hadoop probably helped us do this thing."

1, HDFS

The first thing we should consider is http://www.aliyun.com/zixun/aggregation/13584.html "> How to save massive data and how to manage it." This has a distributed file system, HDFS.

2, Map-reduce

How do we deal with this data after the data is saved, and if I'm dealing with a complex approach, not just sorting, how do I find such an operation? There needs to be a place to write code, let's write the operation ourselves, it's internally decomposed, distributed, recycled, and so on.

3, Hive

Can code is good, but the code is too troublesome, and database personnel are familiar with SQL statements, can be used to deal with SQL, you do not have to map-reduce, so there has been hive. and large data is inseparable from the database, without the table, hive can tell the data map into the data table, and then operation is convenient, its disadvantage is slower.

4, HBase

Since the speed of hive is slow, is there a faster database? HBase is, he was born for the query, the speed of the query.

5, Sqoop

Before there are many well-known database like mysql,oracle, I have the data are there, how to import into the HDFs? Sqoop provides a reciprocal conversion between relational databases and HDFs.

6, Flume

Working on so many computers, if one of them is a bit of a problem, or if there is a problem with which service, how do you know which is bad? Flume provides a highly reliable log capture system.

7, Mahout

Processing large data is used for data mining, there are several common machine learning algorithms, since the algorithm is fixed and on those several, then develop a call mahout things to implement a variety of algorithms, developers can be more efficient use.

8, zookeeper

Zookeeper's goal is to encapsulate complex and error-prone key services and deliver Easy-to-use interfaces and high-performance, functionally stable systems to users. In a way, the zoo administrator, he is used to control the Elephant (Hadoop), Bees (Hive).

The above is the main members of the Hadoop family, there are a few not commonly used without introduction, know that the role of Hadoop as a whole can have a preliminary understanding of what the rest is to slowly learn the principles of various parts and use methods.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.