International - English

Cart Console

Topic Center

Contact Sales

Home > Internet > Online Trends

Hadoop White Paper: Introduction to Distributed computing Framework MapReduce

Last Update:2015-03-17 Source: Internet

Author: User

Keywords Hadoop mapreduce

Tags analysis compared complex data computing data data sources data warehousing distributed

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MapReduce is a high-performance batch processing distributed computing framework for parallel analysis and processing of massive data. Compared with traditional data warehousing and analysis techniques, MapReduce is suitable for dealing with various types of data, including structured, semi-structured, and unstructured data. The data is at terabytes and PB levels, and at this level, traditional methods are often unable to process data. MapReduce divides the analysis task into two categories: a large number of parallel Map tasks and a Reduce rollup task. The MAP task runs on multiple servers. The largest cluster currently deployed has 4,000 servers.

MapReduce Tasks for processing

Complex data: Business data does not fit the ranks of the database structure. Data may come from a variety of formats: multimedia, image, text, real-time, sensor data, and so on. New data formats may appear when new data sources are available. MapReduce can store and analyze various raw data formats.

Large-scale data: Many companies simply give up a lot of valuable data for the high cost of storing data. New data sources make the problem worse, with new systems and users bringing more data than ever before. Hadoop's innovative architecture uses low-cost conventional servers to store and process massive amounts of data.

New methods of analysis: massive and complex data analysis requires the use of new methodologies. The new algorithm includes natural language analysis, pattern recognition and so on. Only the architecture of Hadoop can easily and efficiently use new algorithms to process and analyze massive amounts of data.

Core advantages of the MapReduce framework:

1. Highly scalable, can dynamically increase/cut computing nodes, truly flexible calculation.

2. High fault tolerance, support task automatic migration, retry and forecast execution, not affected by compute node failure.

3. Fair scheduling algorithm, support priority and task preemption, take into account long/short tasks, effectively support interactive tasks.

4. The nearest scheduling algorithm, scheduling tasks to the nearest data node, effectively reduce network bandwidth.

5. Dynamic and flexible resource allocation and scheduling to maximize resource utilization, computing nodes will not appear idle and overload, and support resource quota management.

6. After a large number of actual production environment used and validated, the largest cluster size in 4,000 computing nodes.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

cloud computing white paper 2015 oracle cloud computing white paper hadoop api for mapreduce framework python distributed computing framework java distributed computing framework apache spark white paper java white paper

Getting Started with CDN 12-02

Front-end Must Learn: CDN Acceleration Principle 12-02

Elements of CDN Network 12-01

Understand the Principle of CDN Acceleration in One Article 12-01

Cloud Security Issues Derived from the Development of Cloud C... 11-26

8 New Types of Attacks Facing the Cloud Environment 11-26

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Hot Article

Hot Tags

computing conference access forum computer class data get http html applications

Popular Keywords

html add blank space register business logo register ssl certificate full site sign in sign up node js build cloud register register a subdomain in python network management system tutorial how to learn computer science by myself

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop White Paper: Introduction to Distributed computing Framework MapReduce

Contact Us

Hot Article

Hot Tags

Popular Keywords

Recommend Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support