Radish (: Robbie_qi)
The recent study of a big data company 1010data in the United States, which presented the concept of a new generation of data warehouses in the product whitepaper (next-generation data DISCOVERY), has the following characteristics compared to the first generation Data warehouse:
users can analyze and query for any problem , that is to say, the analysis system to provide a more user-friendly operation experience, more detailed data granularity;
L Analysis efficiency and horizontal expansion, in the case of large data volume, but also to ensure the efficiency of the analysis process;
• data mashup and data sharing, emphasizing integrated analysis of internal and external data, and monetization of data;
In its report, it also emphasizes the importance of self-help analysis, and to get the data analysis out of the IT Support department, which is somewhat similar to the 1th content, but emphasizes the ease of use of the system. To further illustrate its point of view, the white paper compares the first generation of data warehouses with the next generation of data warehouses, as follows:
In general, I agree with the new generation of data warehousing, which is easy to use, efficient, extensible, data sharing, etc., but it is difficult for me to disagree with the comparison, especially in the speed, expansion two. Traditional Data Warehouse, the size of the data can be very large, for example, the telecommunications operators, the single data volume may be very large, in the data warehouse construction, it is necessary to consider the processing speed and expansion of the problem, but will not use the current stage of the very fire of Hadoop, but can use distributed MySQL, parallel computing and other aspects of technology, This improves processing speed and solves the problem of device expansion.
I personally believe that in the big data era to build data Warehouse, focus on the solution is to open up the enterprise data and enterprise data, to achieve "full data" mining and application, which is the essence of big data. The reason for full data analysis is based on the following considerations:
1) can be more comprehensive positioning problems, to propose solutions. Traditional Data Warehouse, because only focus on open enterprise internal business System Island, access to the enterprise internal data, is the enterprise internal factors affecting the problem, and the cause of the problem is often complex, in addition to the enterprise itself, external macro factors, social factors are also essential analysis content, Traditional data warehouses are powerless to do this.
2) Predictions for the future can be more accurate. In the era of big data, more emphasis is placed on data forecasting, which uses data mining algorithm to realize auxiliary decision, and the accuracy of analytic algorithm depends on the diversity and accuracy of variables affecting the prediction result. For example, we are familiar with the video recommendation, according to the user's viewing history recommendation of the most interesting content, the recommendation algorithm's hit rate depends largely on what you can find to affect user interest in the variable, including the viewing history, user classification, popular video, etc., if you collect the user's internal viewing history, Lack of external audience viewing habits, then your recommendation algorithm is flawed, especially when the user size is small, such as the first time to get everyone is looking for "Wumei" signal.
Monday, April 13, 2015
The construction of data Warehouse in big Data Environment