International - English

Cart Console

Topic Center

Contact Sales

Home > Internet > Big Data

Microsoft Big Data Solution: Hadoop Introduction

Last Update:2014-12-22 Source: Internet

Author: User

Keywords Big data Microsoft solutions Hadoop

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop was developed by the http://www.aliyun.com/zixun/aggregation/14417.html ">apache Foundation, an open source organization." The idea of Hadoop is straightforward, by borrowing distributed programs, leveraging the capabilities of the cluster to compute and store. Hadoop sounds abstract, and some of the more bizarre names are listed in some of the documents that make it seem like Hadoop is complicated. In fact, Hadoop is simple and straightforward. The core of Hadoop is 3 components, or even 2. One is the distributed filesystem (Hadoop Distributed File System), referred to as HDFs. Another is the application of MapReduce (and one is the HBase distributed nosql Column database.)

HDFs is characterized by high fault tolerance and is designed to be deployed on low cost hardware. This has a lot of meaning. Because of the low cost of the hardware that holds the data, users can store large amounts of data without having to discard or sequester old data because of cost problems. The principle of HDFs for high fault tolerance is also very simple, so it maintains multiple copies of working data to ensure that it can be distributed to failed nodes. In addition, HDFs access to oversized files is optimized to access data in the file system in the form of streaming to improve the performance of large file access

The MapReduce engine-MapReduce application mainly contains 2 function-map functions and the reduce function. The map accepts a set of data and converts it to a list of key/value pairs, the REDUC function accepts the list generated by the MAP function, and then shrinks the list of key/value pairs based on their keys (generating a key/value pair for each key). Let me give you a simple example.

Assuming the input is I want to buy a cake, your want to buy a bike run the map function will draw the following list of key/value pairs:

(I, 1) (Want, 1) (To, 1) (Buy, 1) (A, 1) (Cake, 1)

(1) (Want, 1) (To, 1) (Buy, 1) (A, 1) (bike, 1)

If you apply the Reduce function to this list of key/value pairs, you get the following set of sequential key/value pairs:

(A, 2) (bike,1) (Buy, 2) (Cake, 1) (I, 1) (To, 2) (Want, 2) (1)

In Hadoop, because MapReduce can be handled in parallel by multiple nodes, such as I want to buy a cake is a node handler, you want to buy a bike is handled by another node and then apply these two key/value pairs to another Reduce function, which embodies the power of the parallel processing of Hadoop.

Others are tools and applications in the Hadoop ecosystem, like Hive,pig and so on. As long as you understand the HDFS and MapReduce functions and principles of Hadoop, you actually understand the main way Hadoop works.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

"Prism Gate" provides a model for people to reflect on person... 04-30

There is no shortage of data mining talent in China, but it i... 04-30

In order to get "big data", Strategic investment Love Station... 04-30

The internet industry in China, we still keep the data very t... 04-29

News client The biggest gold mine is big data 04-27

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Hot Article

Hot Tags

computing conference access forum computer class data get http html applications

Popular Keywords

direct digital landing development documentation data user director of marketing deploy it ddos how to description of products and services ddos information data website domain to dns

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Microsoft Big Data Solution: Hadoop Introduction

Contact Us

Hot Article

Hot Tags

Popular Keywords

Recommend Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support