Long: Hadoop principle, Application scenario and core idea

Source: Internet
Author: User
Keywords Core through large data realization
Long, founder of the Easyhadop community, the original Storm audio platform research and development manager, the first in the country to obtain the United States Cloudera company Apache Development Engineer (CCDH) certification examination); Red Elephant Cloud Teng founder & chief architect, many times in the China CIO Annual meeting, Aliyun Congress, the Beijing University CIO Forum published a large data speech, but also data Wis large numbers Hadoop experts. In this big data salon, the first speaker was delivered.


Hadoop Usage Principle


Hadoop Market is developing fast, even in the bank, telecommunications have begun to try. and Long mainly from the following 3 aspects of Hadoop analysis:


Hadoop principle, working principle and working mechanism


has been confirmed and has yet to be tested and explored


actual use case


Long A collection of Easyhadop community and Redhadoop (start-up) practices that describe the tight links between Hadoop, large data, and cloud computing:


1. The birth of new data services: Similar to Baidu, Tencent, Aliyun and other large companies, through Hadoop such a platform to build a larger data platform, collect data for analysis, and through other ways to push out, that is, the concept of data services.


2. Cloud computing brings competitiveness: essentially, it is the openness of data. Compared to traditional databases, you can perform individual analysis better, and Hadoop does that.


Hadoop vs. old platform

The core of
large data technology is divided into two parts: virtualization technology and Hadoop-like technology. It's also two opposites, and virtualization is more about putting resources into a mainframe, and Hadoop, on the contrary, pooling all sorts of resources. Non-Hadoop platform systems, are core business systems, such as Representative IoE, the following will perlocution the pros and cons of two systems:


Mainframe: stability, high source quality, IO capability is very strong, can manage more disk and data resources, the number of CPUs is also dominant. Of course, there is a limit to the transfer between machines, and storage and the kernel require common bandwidth. The mutual transmission between machines results in a large number of disk IO, causing disk bottlenecks, and the same bandwidth is problematic. At the same time, the problem of poor use of multiple CPUs is also exposed, in general IO becomes the bottleneck of the whole system.


Hadoop: Fragmented, files are cut to different levels, the calculation is moved to the node on the data, through the node implementation of parallel IO, so need to hang a lot of layers. The number of map reduce tasks is tied to CPU cores, so the more CPU cores, the faster the map configuration. Moving the computation instead of moving the data to get higher Io is the meaning of large data.


in this section, Long and other examples to start, more detailed analysis of the MapReduce operating mechanism, but also explained the role and function of HBase.


Hadoop Application Scenario


Long believes that the main applications of Hadoop today are archiving, search engines (the home of the House), and data warehouses, where organizations use different components of Hadoop to implement their own use cases. In addition to these 3 scenarios there is a relatively unpopular scenario-stream processing, which stems from the features of Hadoop 2.0 that can be combined with other frameworks, and in the future, Hadoop will certainly evolve to online data processing.


Hadoop Core Idea


Hadoop platform is able to promote the internal data open, so that everyone can participate in the report, data development process. Can realize the enterprise data sharing, especially the Hadoop queue, resource pool, queue, Task Scheduler mechanism, can let the whole model switch to multiple resources, rather than the previous database, layer by layer of isolation to use. Finally, the Long also explained several practices from the reality.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.