Some problems in the face of large data OLAP analysis-products and technologies
Source: Internet
Author: User
Keywordsnbsp; multidimensional can some on the big
Some problems faced with large data OLAP analysis release time: 2012.05.16 09:25 &http://www.aliyun.com/zixun/aggregation/37954.html ">nbsp; Source: Sadie Net Author:
OLAP analysis requires a large number of data groupings and inter-table associations, which are clearly not the strengths of nosql and traditional databases, and often must use specific databases for BI optimization. For example, most of the databases for BI optimization use column storage or mixed storage, compression, delay loading, pre-statistics of storage blocks, and fragment indexing techniques.
OLAP analysis on the Hadoop platform also has this problem, and Facebook developed the Rcfile data format for hive, using some of the above optimization techniques to achieve better data analysis performance. As shown in Figure 2.
However, for the Hadoop platform, simply by using hive to imitate SQL, for data analysis is not enough, first hive although HIVEQL translation MapReduce is optimized, but still inefficient. Multidimensional analysis is still to do with the fact table and Dimension table Association, the dimension of a lot of performance must be significantly reduced. Second, the Rcfile mixed storage mode, in fact, restricts the data format, that is, the data format is designed for specific analysis, once the analysis of the business model changes, the cost of large-scale data conversion format is extremely large. Finally, HIVEQL is still very unfriendly to OLAP business analysts, and dimensions and metrics are the analytical languages that are directly targeted at business people.
And the current OLAP has the biggest problem is: flexible business, will inevitably lead to business models often change, and business dimensions and metrics once changes, technicians need to redefine the entire cube (multidimensional cube), the business staff can only multidimensional analysis on this cube, This limits the business people to quickly change the perspective of problem analysis, so that the so-called BI system into a rigid day-to-day reporting system.
Using Hadoop for Multidimensional Analysis, first of all, to solve the above dimensions difficult to change the problem, the use of Hadoop data unstructured features, the data collected in itself is a large number of redundant information. At the same time, a lot of redundant dimension information can be consolidated into the fact table, which can change the angle of problem analysis flexibly under the redundancy dimension. Secondly, using Hadoop mapreduce powerful parallel processing ability, no matter how much the dimension of OLAP analysis increases, overhead does not increase significantly. In other words, Hadoop can support a huge cube that contains countless dimensions you think or expect, and each multidimensional analysis can support hundreds of dimensions without significantly impacting analysis performance.
Therefore, our large data analysis architecture, supported by this huge cube, directly converts dimensions and metrics to business people, who define dimensions and metrics by themselves, translate the business dimensions and metrics directly into mapreduce operations, and eventually generate reports. Can be easily understood as a user-defined "MDX" (Multidimensional expressions, or Multidimensional cube query) language →mapreduce conversion tools. At the same time, OLAP analysis and presentation of report results are still compatible with traditional BI and reporting products.
In annual income, users can set their own dimensions. In addition, users can customize dimensions on columns, such as combining gender and academic qualifications into one dimension. Because of the unstructured characteristics of Hadoop data, dimensions can be arbitrarily divided and reorganized according to business requirements.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.