The construction of data Warehouse in big Data Environment

Source: Internet
Author: User

Radish (: Robbie_qi)

The recent study of a big data company 1010data in the United States, which presented the concept of a new generation of data warehouses in the product whitepaper (next-generation data DISCOVERY), has the following characteristics compared to the first generation Data warehouse:

users can analyze and query for any problem , that is to say, the analysis system to provide a more user-friendly operation experience, more detailed data granularity;

L Analysis efficiency and horizontal expansion, in the case of large data volume, but also to ensure the efficiency of the analysis process;

data mashup and data sharing, emphasizing integrated analysis of internal and external data, and monetization of data;

In its report, it also emphasizes the importance of self-help analysis, and to get the data analysis out of the IT Support department, which is somewhat similar to the 1th content, but emphasizes the ease of use of the system. To further illustrate its point of view, the white paper compares the first generation of data warehouses with the next generation of data warehouses, as follows:

In general, I agree with the new generation of data warehousing, which is easy to use, efficient, extensible, data sharing, etc., but it is difficult for me to disagree with the comparison, especially in the speed, expansion two. Traditional Data Warehouse, the size of the data can be very large, for example, the telecommunications operators, the single data volume may be very large, in the data warehouse construction, it is necessary to consider the processing speed and expansion of the problem, but will not use the current stage of the very fire of Hadoop, but can use distributed MySQL, parallel computing and other aspects of technology, This improves processing speed and solves the problem of device expansion.

I personally believe that in the big data era to build data Warehouse, focus on the solution is to open up the enterprise data and enterprise data, to achieve "full data" mining and application, which is the essence of big data. The reason for full data analysis is based on the following considerations:

1) can be more comprehensive positioning problems, to propose solutions. Traditional Data Warehouse, because only focus on open enterprise internal business System Island, access to the enterprise internal data, is the enterprise internal factors affecting the problem, and the cause of the problem is often complex, in addition to the enterprise itself, external macro factors, social factors are also essential analysis content, Traditional data warehouses are powerless to do this.

2) Predictions for the future can be more accurate. In the era of big data, more emphasis is placed on data forecasting, which uses data mining algorithm to realize auxiliary decision, and the accuracy of analytic algorithm depends on the diversity and accuracy of variables affecting the prediction result. For example, we are familiar with the video recommendation, according to the user's viewing history recommendation of the most interesting content, the recommendation algorithm's hit rate depends largely on what you can find to affect user interest in the variable, including the viewing history, user classification, popular video, etc., if you collect the user's internal viewing history, Lack of external audience viewing habits, then your recommendation algorithm is flawed, especially when the user size is small, such as the first time to get everyone is looking for "Wumei" signal.

Monday, April 13, 2015

The construction of data Warehouse in big Data Environment

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.