Source: http://www.cnblogs.com/jiesin/archive/2008/06/23/1227694.html
Let's talk about our understanding of Bi and redescribe it from the definition, basic technology, terminology, instance application and expansion of Bi to consolidate our understanding of Bi.
I. DEFINITION OF Bi
Bi is the abbreviation of business intelligence. It is a collection of technologies that help enterprises make better use of data to improve decision quality, it is a process of drilling information and knowledge from a large amount of data. Simply put, it is the process of applying business, data, and data value. It can be understood:
Figure (1)
It is not hard to see that the traditional transaction system completes the process from business to data, while what Bi needs to do is to make data generate value based on data, this process of generating value is the process of business intelligence analyze.
How to Implement the business intelligence analyze process is a complex set of technologies, including ETL, DW, OLAP, and DM. The basic process can be described.
Figure (2)
The process is simply to extract the data that has occurred in the transaction system to a data warehouse with a specific topic using the ETL Tool. OLAP generates cubes or reports and presents them to users through the portal, you can use the classification, tertering, description, and visualization data to support business decision-making.
Note:
Bi cannot produce decisions, but uses the data processed by the Bi process to support decisions. What is Bi's so-called intelligence? (Clarifying this concept is helpful for Bi applications .) The information that bi finally presents to users is report or Graph view. However, unlike traditional static reports or graph views, Bi subverts the provision and reading of traditional reports or graphs, the data set generated is like the "cube" of toys. You can quickly rotate the combined report or image view, this effectively ensures the simplicity of user operations when analyzing data, the visualization of reports or graphs, and the inertia of thinking.
I think this is why everyone is keen on Bi.
II. The birth of Bi
With the advancement of IT technology, the traditional business transaction system has made great strides and has achieved Business Informatization. Every business data is recorded in the database, accumulated business data records measured in TB. You may ask: what is the use of so much data that occupies a lot of storage devices and consumes storage costs, but does not frequently access it? We can give you a positive answer. Keeping this historical data is of great significance, mining business rules, and supporting decision-making.
Typical cases include the story of "diapers and beer". Diapers and beer are two irrelevant things. However, some people find that they shop in the supermarket on Friday, 30% of the young fathers who bought diapers ~ 40% of customers buy beer at the same time. It turns out that on Friday, when a young father buys diapers, he will also buy beer for his beer, because Friday is the time for TV stations to broadcast football matches, supermarket bosses sold diapers and beer together for great success.
This story is a myth of maximizing commercial value through data mining. From this point of view, there are two very unrelated things. Through massive information data processing, we can find potential associations between them and commercialize these associations, unexpected new business or new business model will be obtained.
In the end, how can we mine the value of data that occupies a large amount of storage space, so that the data can be converted from a consumer of costs to a promoter of profits? The new data analysis technology was born, and the process from "data" to "data value" was completed, at the same time, this technology is given a resounding and secret name "bi" (Business Intelligence)
III. Basic Technology
Business Intelligence (BI) is a new technology that uses data warehousing, online analysis, data mining, and other technologies to process and analyze data. It aims to provide decision-making support for enterprise decision makers. This seems to be the official definition of Bi, and it is also the constant purpose of the majority of Bi players. What aspects does Bi technology involve? From figure (2), we can easily see ETL, DW, and OLAP in its core technologies. Or "data processing technology" and "data presentation technology" are easier to understand.
Why should we add a "Data Warehouse" between operational databases and OLAP?
The disaster caused by one thousand-to-10 thousand million computer resources and performance is that operational databases primarily aim to respond quickly to services, while OLAP occupies a large amount of hardware resources. In OLAP, it is difficult for business operations to respond quickly and cannot ensure the smooth operation of the business. From the logic of the value of business, data, and data, we cannot talk about OLAP without business; sporadic scattered data generally has multiple applications, corresponding to multiple business operation-type databases, and the access efficiency is extremely low. The most efficient way to integrate the above resources and performance is to integrate the data into the data warehouse first, while the OLAP applications retrieve data from the data warehouse in a unified manner, to solve the conflict between quick response and OLAP.
However, when there is another layer, no matter whether ROLAP or molap can view real-time data, this does not affect Bi applications. 90% of Bi applications do not require real-time performance, and data delay is allowed, this is the application feature of the Decision Support System. This lagging interval is the time for data extraction tools and OLAP.
Iv. Data Processing
(1) ODS (Operational Data Store) is an optional part of the data warehouse architecture. ODS has some features of the data warehouse and some features of the OLTP system, it is a topic-oriented, integrated, current, or near-current, and constantly changing data.
In the system architecture with ODS, ODS is designed with the following features:
1) Data transition layer between the business system and data warehouse.
If the business data source is complex, the ODS method is usually used to collect the data to be processed. Data sources include:
A. A wide range of business databases. The business transaction system uses different types of databases, such as DB2, Informix, Oracle, SQL Server, and text.
B. Different application systems and different geographic locations.
C. subscribe to the data source.
D. Restore non-traditional database data in batches.
... And so on. It is used to store data directly extracted from the business system. The data is basically consistent with the business system in terms of data structure and logical relationship between the data.
2) Save the current or near-current detailed data for query or ETL error checking.
3) data is stored cyclically. Data stored in ODS is temporary. Data stored in ODS must be cleared before each ETL operation.
(2) ETL: the process from an operational business database (db) to a data warehouse (DW) is called ETL, which extracts, transforms, and loads data.
Extraction: reads data from various original business systems.
Conversion: according to the pre-designed rules, extract data for conversion, cleaning, and processing of redundant and ambiguous data, so that the original heterogeneous data formats can be unified.
Load: import the converted data to the Data Warehouse incrementally or in full.
Technically, it mainly involves incremental, conversion, scheduling and monitoring.
A simple example is provided to describe ETL.
The following table shows the item sales records from four regions. The four regions are
Figure (3)
No matter what method or tool is used, transforming the data structure of the above four tables into the structure described in the following table and filling the data is an ETL process.
Figure (4)
(3) dw (data warehouse) the official definition of data warehouse is a topic-oriented (subject oriented), integrated (integrate), and relatively stable (non-volatile) A collection of Time Variant data that supports management decisions.
Data Warehouse features:
1) subject-oriented.
2) integration.
3) Non-Easy loss.
4) timeline.
The differences between databases and data warehouses are as follows:
Figure (5)
(4) OLAP (on-line analytical processing) is an Online Analytical Processing method. It is a brand new data encapsulation method of Bi, and the direct product is reports or cubes, it is a kind of software technology that enables analysts, administrators, or executors to quickly, consistently, and interactively access information from multiple perspectives to gain a deeper understanding of data.
Speaking of OLAP, we can't help but think of OLTP (online transaction processing system). Now we will compare the differences between OLTP and OLAP, as described below:
Figure (6)
Let's talk about things that are too theoretical. Let's take a look at how data in a data table is represented in a cube.
View the sales data of a single location and use the usual 2-D flat data table to fully meet all the requirements, as shown in:
Figure (7)
However, if you want to analyze data from more location perspectives, you can add a dimension based on the 2-D flat data to indicate location changes, as shown in:
Figure (8)
In terms of concept, the data can also be expressed in the form of a three-dimensional data cube, as shown in:
Figure (9)
Assuming that another dimension is added to indicate the manufacturer's changes, how should we represent the data? As shown in the preceding figure, we can refer to the data structure as 4-D cubes.
Figure (10)
Similarly, a N-D cube can be represented as a sequence of (N-1)-D cubes. This is the basic principle of OLAP. As for the specific algorithms used to calculate and manage each "cube", there is too much content to be mentioned here...
Note:
Data Cube is a metaphor for multi-dimensional data storage. The actual physical storage of such data is different from its logical representation. It is not limited to 3-D, but n-dimensional.
V. Data presentation
Data Query is the simplest Bi application, and the output report is the most direct product of Bi. According to data connection, processing process and usage, the application mode can be roughly divided into four types: format reports; online analysis, data visualization, and data mining.
1. format report: a set of formatted data, such as a cross tabulation.
2. Online Analysis: multi-dimensional data set, such as cube.
3. Data Visualization: the information is displayed in as many forms as possible, so that the decision makers can quickly obtain the knowledge contained in the information, such as the bar chart and dashboard, through the visual presentation of graphs.
4. Data Mining: extract potential and valuable knowledge (models or rules) from a large amount of data. Analysis Method:
· Classification)
· Estimation)
· Prediction)
· Affinity grouping or Association Rules)
· Clustering)
· Description and Visualization)
Data mining uses historical data analysis to predict customer behavior. In fact, the customer may not know what to do next. Therefore, the results of data mining are not as mysterious as people think. It cannot be completely correct. The customer's behavior is related to the social environment, so data mining is also affected by the social background.
6. Common Bi vendors and products
ETL: Informatica, SQL Server Analysis Server
DW: IBM DB2, Oracle, Sybase IQ, NCR teradata, etc;
OLAP: Cognos, Business Objects, microstrategy, Hyperion, IBM
Data Mining: IBM, SAS, SPSS
Nowadays, many database providers have begun to bind their bi development components to their own database products. They all aim at the fat meat in the product and stick their hands with each other.
VII. Bi in China
China has a cultural history in 5000, and splendid documents make daily reports very cohesive. They are intertwined and embedded inside and outside. The format is strange, the rules are odd, and the data is centralized and named in the world, turning countless report tools down. The concept of Bi is introduced from Europe and America. Most of the existing tools are provided by European and American countries. China is the most complex country in the world, and the report design style is obviously different from those of these countries, reports produced by Bi tools tend to use only one report to describe one problem, while reports in China tend to concentrate as many problems as possible in one report, this approach directly increases the difficulty of Bi tool application.