[Hadoop] Practical scenario-Ali

Source: Internet
Author: User

http://blog.csdn.net/u010415792/article/details/9151475


The application of Hadoop in Taobao and Alipay began in the 09 for off-line processing of massive data, such as log analysis, content, structured data, and so on. Using Hadoop is mainly based on scalability considerations, the scale from the original 3-4 hundred nodes to today a single cluster of 3000 nodes above, 2-3 clusters, Alipay cluster scale of up to 700 units, the use of hbase, personal consumption records, Key-value type.

Ali has made the following changes to the source code of Hadoop:

Improved Namenode single point problem
Increase security
Improve the stability of hbase
Improving the nurturing of the Hadoop community
Ali data processing of the overall architecture diagram is as follows:
The architecture is divided into five layers, namely data source, compute layer, storage layer, query layer and product layer.
Data source: Here is the main site Taobao users, shops, merchandise and transactions and other databases, as well as user browsing, search and other behavioral logs. This series of data is the most primitive vitality of data products.
Computing layer: Data generated in the data source layer, Datax, DBSync and Timetunnel are transmitted to the Hadoop cluster "ladder" by Taobao data transmission component, which is the main component of the computing layer. On the "ladder", there are about 40,000 jobs per day for 1.5PB of raw data according to the product requirements of different mapreduce calculation. Some of the data required for high effectiveness of the "ladder" to calculate the efficiency is relatively low, so that the flow of data real-time computing platform, called "Galaxy." "Galaxy" is also a distributed system that receives real-time messages from Timetunnel, makes real-time calculations in memory, and flushes the results to nosql storage devices for front-end product calls in as short a time as possible. Storage layer: A dedicated storage layer is designed for front-end products. At this level, there is a distributed relational database cluster based on MySQL MyFox and a hbase based NoSQL storage cluster prom. The structure of the MyFox is as follows:
The Prom (ie Prometheus) chart is as follows:
Query Layer (Glider)
Product Layer: Data cube, quantum constant path, etc.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.