Data intelligence (bi-Database Design)

Source: Internet
Author: User

 

Data intelligence (BI) is becoming more and more important to enterprises, and enterprises now pay more and more attention to it. It can timely and accurately reflect the current sales situation of enterprises,
Through this historical data analysis, we can speculate on future development and provide a powerful data basis for the management.

Concepts of Data Mining:

Is from a large number of, incomplete, noisy. Fuzzy and random data is used to extract potentially useful information and knowledge hidden in it that people do not know beforehand. Data mining can be seen as a data search process. It does not have to make assumptions or ask questions in advance, but still can find unexpected but interesting information, this information indicates the relationship and pattern of data elements. It can explore the pattern of a data key and find the most valuable information and knowledge ). Guides business behavior or assists in scientific research. The study targets large and ultra-large data sets.

Data mining methods:
1. Association Rule Mining Technology
2. Classification Technology
3. Prediction Technology
4. Neural Networks and Support Vector Machines
5. Cluster Analysis
6. Time Series Data Mining
7. Abnormal Data Mining

Currently, data mining is more suitable for banks and e-credit. Medium and Small Enterprises may not find the benefits of data mining.
Currently, statistics and queries on major data tables are not deeply mined to find valuable information.

After several years of accumulation, most medium and large enterprises and institutions have established well-developed basic information systems such as CRM, ERP, and OA. All these systems have the same features: by performing operations by business personnel or users, you can add, modify, or delete databases. The above system can be called OLTP (online transaction process), which means that after the system has been running for a period of time, it will certainly help the company collect a large amount of historical data. However, for non-professionals, the large amount of data scattered and independent in the database is just a book of days that cannot be understood. Non-technical personnel need abstract information that can be understood, understood, and benefited from. At this time, how to convert data into information so that non-professional personnel (including managers) can fully grasp and use the information and assist decision-making is the main solution of business intelligence.

How can we transform the existing data in the database into the information required by non-professional personnel? This is what we need to do. Bi can be divided into the following layers based on different stages:

1. Report system ------ it can be called Bi, which is the low-end implementation of Bi
2. Data Analysis-Implementation of mid-range bi
3. Data Mining ------ implementation of high-end bi

At present, most foreign enterprises have entered the Middle-end Bi, and some enterprises have started to enter the high-end Bi, while most of Chinese enterprises are still in the report stage.
Although it is the initial stage of Bi, data reports cannot be replaced. Traditional report systems are quite mature in technology, and you are familiar with Excel, Crystal Reports, and reporting services.
And so on have been widely used. As data increases and demand increases, traditional report systems face more and more challenges.

1) Too much data and too little information
2) It is difficult to analyze and understand various combinations of data, and the customized reports are too rigid.
3) It is difficult to mine potential rules. The report system lists data information on the surface.
4) It is difficult to trace the history and form an isolated island of data. The old data is often backed up by the business system, making it difficult to perform macro analysis and long-term historical analysis.

For the above reasons, the report system is no longer able to meet the increasing business needs. At this time, there must be new technologies to make up for the shortcomings.

The purpose of the data analysis and data mining system is to bring us more value for decision support, rather than replacing data reports. Report systems still have irreplaceable advantages,
It will coexist with data analysis and mining systems for a long time.

Many enterprises have already implemented Bi projects. From the problems faced by these enterprises, it is not difficult to find that, on the one hand, each department is busy preparing various statistical analysis reports all day, on the other hand, it is
The management does not have any help information to effectively support strategic decisions. Although various departments have a variety of statistical analysis reports, they have not formed an implementation and continuity based on the overall strategy.
Improved Business Analysis System

The overall architecture of the data analysis system is divided into four parts: source system, data warehouse, multi-dimensional database, and client.

· Source System: includes all existing OLTP systems. To build a Bi system, you do not need to change the existing system.

· Data Warehouse: data is centrally extracted from the source system, which may be extracted once a day or every three hours.
The data warehouse is still built on a relational database and often conforms to a model called a "star structure.

· Multi-dimensional database: the data in the data warehouse is modeled in multiple dimensions to formCubeBody structure. Each cube describes a business subject, such as sales, inventory, or finance.

· Client: Good client software can present the information in multi-dimensional cubes to users in a rich and colorful manner.

A data warehouse is usually a topic-oriented, integrated, time-changing, but relatively stable data set of information. It is used to support the management and decision-making process.

Theme:

It refers to the key aspects that users are concerned about when using data warehouses for decision-making, such as revenue, customers, and sales channels. The so-called theme-oriented, the information in the data warehouse is organized by topic, rather than organized according to business functions as in the business support system.

Integration:

The information in the index data warehouse is not simply extracted from various business systems, but is processed, sorted, and aggregated in a series of processes, therefore, the information in the data warehouse is consistent global information about the entire enterprise.

Time-based changes:

The information in the index data warehouse does not only reflect the current state of the enterprise, but records information from the past time point to the current stage. With this information, you can make a quantitative analysis and prediction of the enterprise's development history and future trends.

The information itself is relatively stable:

A data warehouse is usually retained for a long time after it enters the Data Warehouse. That is, there are usually a large number of query operations in the data warehouse, but few modification and deletion operations are performed, generally, you only need to load and refresh regularly. Unlike operational databases, data is usually updated in real time, and changes occur in a timely manner as needed.

Now that we know some logical concepts of the database warehouse, let's take a look at how to design the physical structure of the database warehouse.

The physical model of the Data Warehouse is divided into star and snow pattern. The so-called star is that there is only one topic in the model, and other tables store some characteristics of the topic. For example, goods
In the theme warehouse of thing sales volume, each sale record is a fact table, while the time, salesperson, and commodity are dimensions, which are associated with the fact table and are organized as star. But if it is more detailed
From this point of view, there are two themes, one is the sale of goods and the other is the product itself. It is a snowflake pattern.
In actual projects, a large amount of data is generated due to the complexity of the business of the operating system. Therefore, the majority of data is organized in Snowflake.

How can we design fact tables and dimension tables around the subject? There are also rules to follow.

Dividing line between a fact table and a dimension table

Fact tables are used to store the main content of a topic. Taking the daily workload as an example, the workload may have the following attributes: working date, personnel, working hours, overtime hours, nature of work, Field Service, work content, reviewer. So what is the main content? It is easy to see the length of work, overtime is the backbone, that is, the basic content of the workload theme, so the working date, staff, nature of work, whether it is in the field, is the work content the primary information? After carefully analyzing the features, we will find that dates, personnel, nature, and field services can all be categorized. For example, dates have year-month-day layers, and personnel have parent-subordinate relationships, work Attendance and normal work attendance records are also two types of work attendance records, but working hours and working hours do not have such meaning. Therefore, attributes that can be classified are listed separately as dimension tables, and the reference relationship between facts and dimensions is maintained in fact tables.

In general, the design of fact tables is based on the ability to correctly record historical information, and the design of dimension tables is based on the ability to aggregate theme content from an appropriate perspective. Fact tables generally have no primary keys. The quality of data is fully determined by the business system.

this article from the csdn blog, reprinted please indicate the source: http://blog.csdn.net/wyzxg/archive/2009/11/23/4858077.aspx

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.