Illustrated: 5 minutes to learn about Hadoop

Source: Internet
Author: User
Keywords Can massive data these write

Hadoop was the name of a Hadoop developer's child giving his elephant toy. Because the original data storage and processing tools to deal with the internet bubble began to appear after the massive data appears powerless, so the development of Hadoop. First, Google proposes a mapreduce architecture that can cope with the flow of data generated by consolidating global information tasks and make the data highly accessible. Since then, in 2005 Yahoo developed a mapreduce architecture based Hadoop. It was officially released as an Open-source tool under the Apache License agreement in 2007.

Over the years, Hadoop has become an oversized operating system, especially in distributed parallel processing, which is applied to the massive data generated today. Like the usual operating system, Hadoop includes a complete file system that can write programs, and distributed management of these programs and returns the results of calculations.

Hadoop supports data-intensive distributed applications that can run simultaneously on large clusters built on inexpensive hardware devices. It is published under the license agreement of the Apache v2. The Hadoop network is reliable, stable and highly scalable, and can be used to query a large set of data. Hadoop is written in Java, which means it can run on any platform and be used by global distributors and large data technology vendors that have built other software layers on top of Hadoop.

The Hadoop Distributed File System (HDFS) feature makes Hadoop very useful. This is the storage system for Hadoop, which decomposes data into smaller chunks called "blocks." The blocks are then distributed throughout the cluster. This data distribution allows the function of mapping (map) and simplification (Reduce) to be performed on a smaller subset rather than on a large dataset. This improves efficiency, saves processing time, and increases the scalability required to handle massive amounts of data.

MapReduce is a software framework and model that can process and retrieve massive amounts of data stored in a Hadoop system in parallel. MapReduce's library has been written in a variety of programming languages, so Hadoop can work with them. In addition, MapReduce can handle structured and unstructured data.

MapReduce's work is divided into two steps. The first step is "map", which splits the data into smaller subsets and assigns the subsets to different nodes in the cluster. The nodes within the system can perform this operation again, resulting in a hierarchical tree structure that divides the data into smaller subsets. The data is processed on these nodes and the results are returned to the master node. The second step is "simplification". The master collects all the returned data and combines them into some form of reusable output. The MapReduce framework can manage all kinds of parallel tasks and the entire system and form the heart of Hadoop.

By fusing multiple technologies, massive data can be easily stored and processed and analyzed in a fraction of a second. Over the past few years, the large data ecosystem Hadoop has proven to be very successful and looks promising. With the development of Hadoop2.0, it now uses a new working-processing framework called yarn. Yarn is the abbreviation of verb Another Resource negotiator, an application scheduling module that manages cluster computing resources. Yarn enables multiple data processing engines, such as interactive SQL, real-time streaming media, data science, and batch processing, to process information stored on a single platform and create a new approach to analysis.

Hadoop is a powerful tool, with more than 25% of companies using Hadoop to manage their data since 2005, up 10% per cent year-on-year in 2012. Here are some reasons why businesses use it:

Low cost;

High computational power;

High scalability;

Storage flexibility;

Data protection.

Hadoop is used in almost all industries, from retailing to government, to finance. The following information graph produced by Solix provides a more in-depth description of Hadoop and some interesting predictions.

  

  

  

  

Original link: http://www.bigdata-startups.com/BigData-startup/what-is-hadoop-and-five-reasons-organisations-use-hadoop-infographic/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.