Three major contradictions in BI applications (zz)

Source: Internet
Author: User
Document directory
  • Conflict 1: Business Department's understanding of data and data department's understanding of requirements
  • Conflict 2: constantly changing business needs and complex processes for generating data
  • Conflict 3: Efficiency of real-time data query and processing and Modeling of massive data

Due to recent work changes, I have been busy with a lot of things, work handover, resignation procedures, and new work ideas. At present, I am still at this stage, so there may not be any new content to share with you in the near future. The most recent articles will focus on some summary content, mainly on the previous work. However, I believe that more data analysis-related content will be presented to you later, because I believe that the new company I want to go to is a dynamic, creative, and challenging place, the most important thing is that they pay attention to and understand data.

Seeing the title of the article, I believe everyone knows that this article is still about BI. In fact, this is an article I wrote when I first entered my current company, now, looking back, even if we have been trying to coordinate these contradictions, but to be honest, no one is actually completely resolved. I believe that if other companies build their own BI systems, they will encounter these problems more or less. One or two of these contradictions may be bothering you. I have provided my solutions here, as for the feasibility and effects, we need to verify them.

Conflict 1: Business Department's understanding of data and data department's understanding of requirements

Put it first because it directly affects the effectiveness of data, or if the contradiction is not well coordinated, the value created by data will be greatly reduced. The cause of this conflict is that the business department cannot understand the entire process of data acquisition, processing, and computing, and thus has its own understanding of the meaning and usefulness of data; at the same time, the data department cannot really understand the business needs, but does not know where the data is actually used. To monitor or evaluate which aspect of the product, it cannot provide the optimal or most effective data.

Solution: Establishes an interface between the business department and the data department. This interface includesStandard Process,Detailed documentation,Reasonable data presentationAnd the most important thing is to be ableConnecting People between business and Data.

The first is the standardization of the data requirement process, that is, the requirement is generally raised by the business department. The data Department obtains and computes the data and returns the result to the business department, in this process, the business department should not only provide data rules, but also provide a detailed description of the purpose, indicator definition, use, and value of the data; the data department should not only provide the final data, but also explain the methods for obtaining indicators and calculation methods. The ultimate goal is to allow both parties to reach an agreement on understanding.

The second is detailed documentation. This is actually the two types of documents that will inevitably be generated in the process mentioned above:Data requirement documentAndData Interpretation document(It is an important part of metadata in a data warehouse. I have always wanted to organize an article about metadata in the data warehouse and hope to post it as soon as possible ), the content of the document is basically the content mentioned in the above process.

Furthermore, it is a reasonable data presentation. It is actually a principle: Let everyone see the data they want to see and understand it intuitively. For reports, Excel, and other display methods, each indicator should be able to directly view the corresponding data interpretation document, and the data should be displayed in the most intuitive way for ease of understanding, use various charts to combine them.

The last and most important point is the connection between business and data. Such personnel should be familiar with the product's strategic objectives and business processes, and be familiar with data acquisition methods and computing methods, it may not require high-tech ETL processing, organization, and optimization, but it must be capable of computing and obtaining various types of data on its own.

Conflict 2: constantly changing business needs and complex processes for generating data

Business needs are constantly changing, especially in the rapidly growing environment of the Internet. Therefore, we often encounter new requirements from business departments every day, or the computing logic of a certain indicator changes several days ago. In the face of these situations, the data department is often in trouble. On the one hand, some indicators cannot be calculated due to data acquisition problems, on the other hand, changing the index computing logic may need to be changed to the entire complicated data processing process, which is a headache.

  Solution:Integrated and complete underlying data and fast and flexible data acquisition methods.

In fact, I mentioned in my article about the data warehouse architecture that the data warehouse tries its best to store all the detailed underlying data, it includes raw log click stream data, ODS data in the front-end database, and data from other sources. In fact, I do not recommend that the data warehouse be a multidimensional model established based on requirements, because the demand always changes, but the multi-dimensional model lacks flexibility to cope with changes. However, if the underlying data is saved, it can be changed by tens of thousands in most of the time, because almost all indicators are calculated from the underlying data, having the underlying data is equivalent to meeting the needs of most data.

Another problem is the timely response to demand changes. One method is to establish a multi-dimensional model for different themes (of course, built on the upper layer of the underlying data ), because the multidimensional model can meet the observation and analysis of data from multiple perspectives on multiple layers, it can solve various data needs to a certain extent. At the same time, it is based on the integrated organization and management environment of the underlying data, use standardized statistical languages, such as SQL statements, to accelerate data acquisition and computing with their powerful ability to aggregate, sort, and group data.

Conflict 3: Efficiency of real-time data query and processing and Modeling of massive data

In fact, this is a trade-off, that is, how to ensure that the efficiency of data presentation, acquisition and query can meet the requirements of the Data demand side on the premise of providing rich enough indicators. If the provided indicators are not enough or the data granularity is not small enough, the daily data monitoring and analysis needs cannot be met. On the contrary, if the daily calculation and statistics indicators are too large or the data is too detailed, it will obviously increase the negative load of server operations, and the response capability in data query will also decrease accordingly.

  Solution:GRASP core data and establish a rational multi-dimensional model.

In fact, the efficiency of processing and querying massive data in the data warehouse is a deep learning, optimization of data warehouse structure, ETL, and OLAP (the basic features of OLAP in the previous article refer to Oracle's optimization in this respect ), here we will not talk about the technical implementation methods, or the application.

Core data, in short, is the website's goals and KPIs. These data are the data that the top-level personnel and grass-roots personnel are always concerned about, therefore, the most important principle is to ensure the efficiency and timely response of the data query. The simplest way is to separate statistics of these indicators. Instead of putting them into a multi-dimensional model, we only store simple aggregation on a daily basis into a Summary table for report presentation.

The other is to establish a rational multi-dimensional model. When it comes to rationality, We need to complain that the data demand side will initially raise various demands without any limitations, and there may be hundreds of indicators, however, once the statistics are made, few people will actually use and analyze these indicators (probably because they will be dazzled). I have mentioned similar issues in real-time data statistics. Because a dimension or dimension layer is added to a multi-dimensional model to deepen a layer, the data of cubes is incremented by product. For example, adding a dimension of 100 records is equivalent to multiplying the data of cubes by 100, or the granularity of time dimension ranges from day to hour, which is 24 times the original data volume. This is a disaster for multidimensional models with extremely large data volumes. Therefore, when creating a multi-dimensional model, the principle is to provide dimensions and indicators required in practical applications, while grasping the hierarchical granularity of each dimension.

The above are the three major problems I have encountered. I have written so many questions at once. I hope you can read them with patience. In fact, the previous work also involves a lot of technical things, mainly Oracle and PL/SQL, because it is not very good at that aspect, in addition, my blog is mainly designed for website data analysis, so I am afraid to make a lot of summary. If you want to discuss this, I can share a few articles, you can leave a message to give me some suggestions.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.