As early as the end of 1990s, business intelligence technology was selected by a computer magazine as one of the most influential IT technologies in the next few years. Although the entire IT industry has been sluggish in recent years, product R & D and applications related to business intelligence are in the ascendant, and hundreds of IT enterprises are constantly entering this emerging field, bi applications have even become another new "highlight" in the IT field ". What is business intelligence? What technologies are supporting such promising business intelligence applications? The answer is-
Business Intelligence is not a basic technology or product technology, it is a data warehouse, Online Analytical Processing OLAP (Online Analytical Processing) and data mining and other related technologies to commercial applications after the formation of an application technology.
The business intelligence system converts original business data into enterprise decision-making information. Unlike general information systems, it has outstanding performance in processing massive data, data analysis, information presentation, and other aspects.
Architecture of Business Intelligence System
The business intelligence system consists of four major stages: data preprocessing, data warehouse establishment, data analysis, and data presentation. Data preprocessing is the first step to integrate the original data of an enterprise. It includes three processes: data extraction, conversion, and loading. Creating a data warehouse is the basis for processing massive data. Data analysis is the key to reflecting system intelligence. Generally, two technologies are used: online analysis processing and data mining. Online analysis not only summarizes and aggregates data, but also provides data analysis functions such as slicing, cutting, drill-down, rolling, and rotating. You can easily perform multidimensional analysis on massive data. The goal of data mining is to mine the hidden knowledge behind the data, establish an analysis model through association analysis, clustering and classification, and predict the future development trend and problems of the enterprise. With the increase of massive data and analysis methods, data presentation ensures the visualization of system analysis results. Data warehouse, OLAP, and data mining are generally considered as three major components of business intelligence.
Data warehouse: the foundation of business intelligence
For an enterprise, the most important thing is how to sort out the historical data accumulated in various business processing systems in an effective way, it also provides a unified information view for business personnel at all levels in a flexible and effective way, so as to achieve real information sharing within the entire enterprise. The data warehouse technology meets this requirement. Data Warehouse is the foundation of the business intelligence system. If there is no data warehouse or the integration of enterprise data, data analysis will become the source water.
Data Warehouse has four important features:
1. Data Warehouse is subject-oriented. Traditional operating systems are organized around the company's applications. For example, for a telecom company, the application problems may be business acceptance, professional billing, and customer service. The subject scope may be customers, packages, payments, and overdue payments.
2. Data Warehouses are integrated. Data Warehouses enable data integration from an application-oriented operating environment to an analysis-oriented data warehouse. Because different application systems are inconsistent in coding, naming conventions, actual attributes, and attribute measurements, some method should be used to eliminate these inconsistencies when data enters the data warehouse.
3. Data Warehouses are non-loss-prone. Data in a data warehouse is usually loaded and accessed together. In a data warehouse environment, data is not updated in the general sense.
4. Data Warehouse changes over time. Data in a data warehouse changes over time in three aspects:
1) The data time period in a data warehouse is much longer than that in an operating system. The duration of the operating system is generally 6 0 ~ 9 0 days, while the data period in the data warehouse is usually 5 ~ 1 0 years.
2) The operational database contains "Current Value" data. The accuracy of the data is valid during access and the data of the current value can be updated; the data in the data warehouse is only a series of complex snapshots generated at a certain time point.
3) The key-code structure of the operation data may or may not contain time elements, such as year, month, and day. The key-code structure of the data warehouse always contains time elements.
OLAP: powerful tool for analyzing massive data
OLAP is a powerful data analysis tool for TB-level massive data. It allows managers to flexibly browse and analyze massive data. With the concept of multi-dimensional, OLAP provides multi-dimensional analysis and cross-dimensional analysis functions such as slicing, cutting, drilling, rolling, and rotating. Compared with common static reports, OLAP can better meet the needs of decision makers and analysts for Data Warehouse analysis.
Unlike traditional online transaction processing (OLTP) systems, OLAP has 12 rules:
1. the OLAP model must provide a multi-dimensional conceptual view
2. transparency principles
3. Estimation of access capability
4. Stable report capability
5. Customer/Server Architecture
6. Equality criterion of dimensions
7. Dynamic sparse matrix processing principles
8. Multi-user support criteria
9. Unrestricted cross-dimension operations
10. Intuitive data manipulation
11. Flexible report generation
12. Unrestricted dimension and aggregation Layers
Although some standards have made breakthroughs with the development of technology, these standards are still the basis of OLAP technology.
OLAP system architecture is divided into three types: relational database-based ROLAP (Relational OLAP), multi-dimensional database-based molap (multidimen1_olap), and hybrid data-based holap (Hybrid OLAP. The first two methods are common. ROLAP indicates the OLAP implementation based on relational databases. It uses relational databases as the core and relational structures to represent and store multidimensional data. ROLAP divides the multidimensional structure of a multi-dimensional database into two types of tables: fact tables used to store data and dimension keywords, and dimension tables, that is, at least one table is used for each dimension to store the description information of dimension levels, member categories, and other dimensions. Molap indicates the OLAP implementation based on multi-dimensional data organization. It uses multi-dimensional data as the core and uses multi-dimensional arrays to store data. Molap queries use a combination of index search and direct addressing, which is much faster than ROLAP's table index search and table connection methods.
Data Mining: the source of insights
Unlike the analysis methods for displaying Enterprise history and existing information, such as static and dynamic reports and queries, data mining intelligently searches for models in databases and summarizes useful information from massive data volumes. It can be said that through the business intelligence system, the main means for enterprises to gain insight is data mining.
Data Mining is a massive, incomplete, noisy, fuzzy, and random data, the process of extracting potentially useful information and knowledge hidden in it that people do not know beforehand.
Data mining technology can be divided into descriptive data mining and predictive data mining. Descriptive data mining includes data summarization, clustering, and association analysis. Predictive data mining includes classification, regression, and time series analysis.
1. Data Summary: inherited from statistical analysis in data analysis. The purpose of the Data summary is to concentrate the data and provide a compact description. Traditional statistical methods such as sum, average, and difference are all effective methods. You can also use histograms, pie charts, and other graphical methods to express these values. In a broad sense, multidimensional analysis can also be classified into this category.
2. Clustering: divides the entire database into different groups. It aims to make the difference between a group and a group obvious, while the data between the same group is as similar as possible. This method is usually used for customer segmentation. You do not know how to divide users into several categories before starting subdivision. Therefore, clustering analysis can be used to identify groups with similar customer characteristics, such as similar customer consumption characteristics or similar age characteristics. On this basis, you can develop marketing solutions for different customer groups.
3. Association Analysis: it is used to find the correlation between the values in the database. Two common technologies are association rules and sequence pattern. Association rules are used to find the correlations between different items in the same event. The sequence pattern is similar to the sequence pattern in that it looks for the temporal correlations between events, such as the analysis of stock ups and downs.
4. Classification: The purpose is to construct a classification function or classification model (also known as classifier). This model can map data items in the database to a specific category. To construct a classifier, you must have a training sample dataset as the input. A training set consists of a set of database records or tuples, each of which is a feature vector consisting of values of relevant fields (also known as attributes or features). In addition, a training sample also has a category tag. A specific sample can be expressed as: (V1, V2,..., vn; c), where VI represents the field value and C represents the category.
5. Regression: prediction of values of other variables by using variables with known values. In general, regression uses standard statistical techniques such as linear regression and nonlinear regression. Generally, the same model can be used for both regression and classification. Common algorithms include Logistic regression, decision trees, and neural networks.
6. Time Series: Time series uses past values of variables to predict future values.
Another important aspect of data mining is its related methodology. General transaction processing systems and even some simple business intelligence systems that only provide the report analysis function. After being built, only a small amount of engineering maintenance work is required, however, the use of data mining technology in business intelligence systems is often very different. Because data mining is a series of repeated and adjusted processes, such as business understanding, data understanding, modeling, and evaluation, and the application of the model is not static, update and rebuild as appropriate. Therefore, the general business intelligence project does not pursue one-time project construction. It advocates a consulting service that is closely related to the business of the enterprise and can enhance the competitiveness of the enterprise, in addition, analysts familiar with business and analysis methods play a vital role in the application of business intelligence systems. From this point, we can also see why Bi is a higher level and more strategic application after enterprise MIS.
It is true that we should have an objective understanding of data mining or business intelligence. In a broad sense, data mining is a knowledge discovery technology based on traditional data analysis methods that integrates database, AI, and other technologies. It will inevitably produce positive results for enterprise information analysis, and its auxiliary functions for enterprise business decision-making are also obvious. However, data mining is only a technology and method that is not omnipotent. Business Intelligence systems provide enterprises with an environment for business analysis and some analysis tools. How to Adapt to the actual business of an enterprise and explore the knowledge that contributes to the competition in the enterprise market from massive volumes of business data is not much reflected in the business intelligence system itself. Therefore, the true source of enterprise insight is the successful application and practice of business intelligence systems and data mining technologies.