Microsoft Big Data Solution: Hadoop Introduction

Source: Internet
Author: User
Keywords Big data Microsoft solutions Hadoop

Hadoop was developed by the http://www.aliyun.com/zixun/aggregation/14417.html ">apache Foundation, an open source organization." The idea of Hadoop is straightforward, by borrowing distributed programs, leveraging the capabilities of the cluster to compute and store. Hadoop sounds abstract, and some of the more bizarre names are listed in some of the documents that make it seem like Hadoop is complicated. In fact, Hadoop is simple and straightforward. The core of Hadoop is 3 components, or even 2. One is the distributed filesystem (Hadoop Distributed File System), referred to as HDFs. Another is the application of MapReduce (and one is the HBase distributed nosql Column database.)

HDFs is characterized by high fault tolerance and is designed to be deployed on low cost hardware. This has a lot of meaning. Because of the low cost of the hardware that holds the data, users can store large amounts of data without having to discard or sequester old data because of cost problems. The principle of HDFs for high fault tolerance is also very simple, so it maintains multiple copies of working data to ensure that it can be distributed to failed nodes. In addition, HDFs access to oversized files is optimized to access data in the file system in the form of streaming to improve the performance of large file access

The MapReduce engine-MapReduce application mainly contains 2 function-map functions and the reduce function. The map accepts a set of data and converts it to a list of key/value pairs, the REDUC function accepts the list generated by the MAP function, and then shrinks the list of key/value pairs based on their keys (generating a key/value pair for each key). Let me give you a simple example.

Assuming the input is I want to buy a cake, your want to buy a bike run the map function will draw the following list of key/value pairs:

(I, 1) (Want, 1) (To, 1) (Buy, 1) (A, 1) (Cake, 1)

(1) (Want, 1) (To, 1) (Buy, 1) (A, 1) (bike, 1)

If you apply the Reduce function to this list of key/value pairs, you get the following set of sequential key/value pairs:

(A, 2) (bike,1) (Buy, 2) (Cake, 1) (I, 1) (To, 2) (Want, 2) (1)

In Hadoop, because MapReduce can be handled in parallel by multiple nodes, such as I want to buy a cake is a node handler, you want to buy a bike is handled by another node and then apply these two key/value pairs to another Reduce function, which embodies the power of the parallel processing of Hadoop.

Others are tools and applications in the Hadoop ecosystem, like Hive,pig and so on. As long as you understand the HDFS and MapReduce functions and principles of Hadoop, you actually understand the main way Hadoop works.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.