Original address: Http://blog.csdn.net/sgtzzc/archive/2009/10/10/4649770.aspx
Objective
Yesterday, the Forum's SQL Server big version of the new BI section, we have a heated discussion, because of the customer environment has been using SQL 2000, did not pay attention to this piece of things, recently just to do shopping basket analysis, so to collect some information online. In order to keep "unison", also write some or turn something out, and everyone to learn and share.
This is a good bi primer, and it should be helpful for friends who don't know what bi is, why bi, and what bi can do. It is a pity that the pictures in the article can not be found (the text refers to the part of the reference picture please ignore), to find a bit on the internet did not find, if there is a collection of this part of the picture of friends or see this part of the picture of friends, welcome to share a bit.
"Repost content"
(1)boss, what do you want with so much data? If you are the owner of a retail company. Your company is very advanced, has realized the business informatization, every sales document is kept in the database, accumulated over the years, has been saved more than 10 sales data, hundreds of millions of sales records. At this point if I ask you: "Anyway, three years ago the data kept in vain to occupy the place, the cost of storage costs, simply delete them all, so do not buy hard disk can accommodate new data, how?" "Will you take my advice calmly?" So, boss, what do you want with so much data? Yes, like me, you've also been vaguely aware of the value of data, which is why we can't let go of historical data, just like any modern enterprise, or even like any traditional ticket, such as a century-old store devout preserved the ancient data, because we have intuition, our intuition tells us: these data useful! But this is just a hunch about how to dig up the value of the data that takes up a lot of storage space and turn that data from cost-consuming to a profit promoter? There seems to be a lack of some link in the middle.(2) Business Intelligence-connecting data with decision makers BI (Business Intelligence) is a new technology that uses technologies such as data warehousing, online analytics, and data mining to process and analyze data to provide decision support to enterprise decision makers. let's effigy three times: Decision support, decision Support, decision support! Bi is a factory: >> Bi's raw material is a huge amount of data; >> BI products are information and knowledge that is processed from data; >> bi Push these products to business decision makers; >> decision makers use the products of BI factories to make the right decisions and promote the development of enterprises; This is business Intelligence-----------connect data and decision makers, and change data to value. The two categories of BI applications are information-class applications and knowledge-class applications, which are characterized by the following table:
information class BI Apply |
refers to data queries , report charts , processed from raw data , Multidimensional Analysis , data visualization , and more, the common feature of these applications is the transformation of data into information acceptable to decision makers, to be presented to decision makers. For example, processing bank transaction data into bank financial statements. The |
is responsible for providing information only and does not proactively parse the data. For example, the Bank's financial reporting tools do not have a deep analysis of the relationship between customer churn and bank interest rates, but can only rely on decision-makers to combine information, through people's thinking, to draw knowledge. |
knowledge class BI Apply |
refers to using data mining techniques and tools to uncover the hidden relationships in the data, Using computers to process data directly into knowledge, to the decision-makers. |
will proactively explore data associations to uncover hidden knowledge that policymakers can't quickly discover. and present it in an understandable form to the decision makers. |
(3) Overview of BI primary Application Mode--data query (querying) data query is the simplest BI application, belonging to MIS system heritage, although the background is more old-fashioned, but it is still the most direct way for decision-makers to obtain information. Today, the data query interface has been completely rid of the traditional SQL command line, a large number of drop-down menus, input boxes, list boxes and other elements or even the mouse drag-and-drop interface to the background of the dry labor of SQL statements packaged into a flirtatious data acquisition system, and the essence still does not leave the data query several major elements: >> What to look for >> where to look >> filters >> presentation methods At present, the more popular data query applications abroad have completely freed up the flexibility of data query, as shown in the image on the right is the Cognos reportnet data query interface Query Studio, which allows users to define data query features through a browser-only interface, with a drag-and-drop operation, and to present data in a variety of ways, such as reports and charts.
(4) Overview of BI Primary application Mode--report (Reporting) report is one of the most enthusiastic bi applications in China, which is inseparable from the historical position of the report in Chinese enterprises and institutions. The report in our country is famous for its strange format, data concentration, odd rules and so on, which has made countless foreign reporting tools and BI tools wringing. The two main elements of a report are data and formatting, and if there is no format, the report app is almost identical to the data query app. It can be said that the report is the query out of the data in the specified format. The report application includes two modules: report presentation and report making. Report presentation is to let the decision-maker see the report, and allow the decision-maker to select the report data through the conditional definition, such as selecting the report year, Department, organization, etc. report making report-oriented developers, the format definition flexibility, data mapping flexibility, the richness of computing methods, etc. all affect the quality of the BI report application. To clarify, Microsoft Excel is not considered a BI reporting tool because Excel does not have the ability to connect to a data source, at best a Spread Sheet. But Excel's powerful formatting features make report makers bow down, and even then, almost all BI vendors provide plug-ins for Microsoft Excel, through plugins, Excel can connect to BI data sources, turn into BI reporting tools, ugly duckling to Swan.
5) Overview of the BI Advanced application Mode-Online analytical processing,olap OLAP, or online analytics processing, is a new way of data observation for BI One of the core technologies. We know that data is stored in a data table in a database, such as a store's sales data stored in a table of data such as the following:
Sales Time |
Sales Locations |
Products |
Sales Quantity |
Sales Amount |
2004-11-1 |
Beijing |
Soap |
10 |
342.00 |
2004-11-6 |
Guangzhou |
Orange |
30 |
123.00 |
2004-12-3 |
Beijing |
Banana |
20 |
12.00 |
2004-12-13 |
Shanghai |
Orange |
50 |
189.00 |
2005-1-8 |
Beijing |
Soap |
10 |
342.00 |
2005-1-23 |
Shanghai |
Toothbrush |
30 |
150.00 |
2005-2-4 |
Guangzhou |
Toothbrush |
20 |
100.00 |
What policymakers want to know is often macro information such as distribution, percentage, trends, such as the following questions:
>> the quantity of sales in Beijing area is changing trend of time?
>> which product sold in 2005 was the biggest increase in sales in 2004?
>> 2004 the proportion of sales in each product distribution? ......
In the face of this requirement, a large SUM operation must be performed with SQL statements, and SQL SUM is required for each result of a problem. Faced with the above 7 records, we can easily come to the results, but when we face millions or even hundreds of millions of records, such as mobile phone call data, each time the SQL SUM will take a lot of time to calculate, the decision-makers often on the first day to ask the analysis requirements, wait until the next day to get the results of the calculation, This method of analysis is "offline analysis" and is inefficient.
In order to improve the efficiency of data analysis, OLAP technology completely breaks down the way of data browsing in recorded units, and separates the data into "Dimension (Dimension)" and "Measure (Measure)":
The >> dimension is the angle of observation data, such as "Sales Time", "Sales Location", "product" in the example above;
The >> metric is a specific quantity value, such as "Sales Quantity" and "Sales Amount" in the example above;
In this way, we can convert the sheet data list above into a data cube (cube) with three dimensions:
The process of probing the data is to determine a point in the cube and then observe the measure of that point:
Of course, the data cube is not limited to three dimensions, where three dimensions are used to illustrate the problem, just because the limit shown through the graph is three dimensions.
Dimensions can be divided into layers, such as time can be summarized from day to month and year, the product can be summarized up as food and daily necessities, the location can be summarized up to North and South China, users can drill down at any level of the dimension (drill down) and upward summary (roll up):
In this way, we can get rid of the SQL SUM of the speed constraints, quickly locate the different conditions of the details of the data, you can quickly get a level of summary data. OLAP technology for decision-makers to provide multi-angle, multi-level, high-efficiency data exploration method, the decision-maker's thinking is no longer constrained by a fixed drop-down menu, query conditions, but by the decision-maker's thinking to lead the data acquisition, arbitrary combination analysis and analysis of the objectives, this break the traditional interactive analysis and high efficiency to make OLAP Become the core application of the BI system.
(*) Fourth spray: BI Advanced Application Mode-data visualization and data mining
(6) Overview of BI application Patterns-Data visualization (visualization)
Data visualization applications are committed to presenting information in as many forms as possible, with the aim of enabling decision makers to quickly gain knowledge in information, such as trends, distributions, and densities, through a graphical representation. It is worth mentioning that the MapInfo company as a representative of the GIS software vendors, is also currently working to integrate BI applications. MapInfo pioneered the location Intelligence concept, relying on geographic information systems, to show the attribute values of various regions, such as population density, industrial output, number of hospital per capita, and so on, this visual application part and BI data visualization application coincident, and form a strong complement, Sometimes you can match each other in a project. As shown in the Cognos Visualizer product, this guy presents data and information in a rich form of near-grandstanding, including maps, pie charts, waterfall charts, and nearly 50 kinds of presentation graphics, and offers two-and three-dimensional representations. All graphical elements are active, for example, users can tap into a province on the map to drill down to information about each city in the province, which is a significant difference between BI and normal image generation software.
(7) BIApplication Mode Overview-Data mining (Mining Data Mining is the most advanced BI application because it can replace part of the brain function. data Mining belongs to the special case of Knowledge discovery (Knowledge Discovery) in structured data. The purpose of data mining is to analyze large amount of data by computer, find out the law and knowledge hidden in data, and show it to users in an understandable way. The three main elements of data mining are: >> technologies and algorithms: The data mining techniques commonly used today include -- Automatic category Detection (auto Cluster Detection) Decision Tree (decision Trees) Neural Network (neural Networks) >> Data: Because data mining is a process of digging unknown in known, therefore requires a lot of data accumulation as a data source, data accumulation The greater the amount of , the data mining tool will have more reference points. >> Predictive Models: This is the business logic that will require data mining by computer simulation, which is also the main task of data mining. Compared with information bi applications, the knowledge-based BI application represented by data mining is not mature yet, but from another point of view, the development of data mining space is still very large, is the key direction of BI development in the future, SAS,SPSS and other knowledge class bi The application company image is growing tall, quietly occupy the new profit growth point.
(8) BI Base-Data Warehouse before starting this topic, let's take a look at the official definition of the Data Warehouse: Data Warehouse is a topic-oriented ( Subject oriented), integrated (integrate), relatively stable (non-volatile), data collection reflecting historical change (time Variant) to support management decisions. The above is the official definition of the Data warehouse. "Operational database" such as the Bank accounting system database, every business operation (such as you save 5 yuan), will be immediately recorded in this database, over time, full belly accumulation is fragmented data, this dry dirty live dirty not cannot database called "Operational database", oriented to business operations. "Data Warehouse" for decision support, analytical data processing, different from the operational database; In addition, the data warehouse is an effective integration of multiple heterogeneous data sources, the integration followed by the subject of reorganization, and contains historical data, and the data stored in the Data warehouse is generally no longer modified. Operation database, data Warehouse and database relationship, like C:, D: And the relationship between the hard disk, the database is the hard disk, the operation of the database is C:, the Data Warehouse is D:, the operational database and data warehouse are stored in the database, but the table structure of the design pattern and use of different.
So why add such a layer of "data warehousing" between the operational database and the BI?
First, because the operational database is busy day and night, in order to quickly respond to the business as the main goal, there is no energy to serve the data needs of the BI side, and bi side of the data demand is usually a summary type, a select sum (XX) group by XX can make the operation of the database resource-intensive, Business processing can not keep up, trouble is big, such as you saved 5000 yuan, found that 10 minutes after the money has not arrived, how to feel? Must be the bank's leader looking at the pie chart?
Second, because there are a number of applications in the enterprise, corresponding to a number of operational databases, such as human Resources Library, financial library, sales document library, inventory and so on, BI in order to provide a panoramic view of the data, it is necessary to integrate these scattered data, for example, in order to achieve a converged sales and inventory information OLAP analysis, BI tools must be able to efficiently access data from two of databases, and the most efficient way to do this is to consolidate the data into the Data warehouse first, and the BI application to take the data from the warehouse in a unified manner.
Integrating data from a decentralized, operational database into a data warehouse is a brainiac that has spawned the market for data integration software. Instead of simply overlaying the tables together, this integration must extract the dimensions of each operational database, set the common dimensions to a common dimension, and then unify the database tables that contain the specific metrics into several Zhang (the term "fact table", Truth Tables) as a subject. The Data Warehouse table structure is established by the dimension-measure model, and then the data extraction is converted. The subsequent extraction is generally when the operational database load is relatively small (such as the wee hours), the new data are incrementally extracted, so that the data in the Data Warehouse will form accumulation.
Most BI applications do not require real-time data, such as decision-makers, only need to see Weekly weekly week, 95% of BI applications do not require real-time, allowing data to be 1 hours to 1 months lag, which is the application characteristics of decision support system, This lag interval is the time when the data extraction tool is working. Of course, the BI application will often also contain a very small demand for real-time data, only for these special needs, the BI querying software directly connected to the business database, but must limit the load, prohibit the complex query.
The current database products are specifically optimized for the data warehouse, for example, when installing a high version of MySQL, the installation sequence will ask whether you want the database instance as transaction-oriented, or decision support, the former is the operational database, The latter is the Data Warehouse ( decision Support , and then effigy again), for both forms, the database will provide targeted optimization.
(9) bi lace Bi's knowledge about this, write some lace as a conclusion. Bi key: BI cannot process unstructured data and can only handle digital information, but in the enterprise there are a lot of unstructured data like text, streaming media, pictures, etc., which also contain a lot of value, but faced with these data, the current BI tools are powerless. The more plausible is IBM Intelligent Miner for Text, but it seems to be very weak in dealing with Chinese. BI Vendors and Products:
First, let's meet the foreign big shots! Data warehousing, there are IBM db2,oracle,sybase iq,ncr Teradata, and so on, the BI application, there are cognos,business objects,microstrategy,hyperion,ibm and so on Data mining, there are ibm,sas,spss and so on. Big Mac Microsoft has also inserted a leg in the BI field, launching the SQL Server Analysis Server, Reporting Services, and other bi-related products to preempt the hill!
We tend to only look at the foreign bi big boys and ignore the gradual rise of the bi New army, now domestic more famous bi has the Power-bi, Gyeongsangnam Bluequery and run dry report, especially worth mentioning is the POWER-BI is a standardized bi, In the domestic already has a certain market share. China's BI market development:
Time period |
Domestic BI Application Situation |
2002 ago |
A large number of BI software is considered to be able to extract data from multiple data sources of the report work, eyeful full of reports. At first, the company's sales in the promotion of products are introduced to the user: "We are the strongest in the field of BI ..." effect is not good; later those sales finally found the trick, come up and say: "We can do any report!" Then the order continued. |
2002-2003 |
The value of OLAP has finally been found by some eye, some competitive pressure enterprises in order to improve competitiveness, the urgent need to dig the value from the historical data, quickly discovered the advantages of OLAP, then sales finally need not say "we can do any report". But State organs, monopoly enterprises, is still a statement, and think BI is a report. |
2004 |
With more and more successful BI project implementation, OLAP finally can see the daylight, then the domestic formation of data query + report display +OLAP analysis of reasonable BI application structure. Some data visualization requirements are often raised by the user, in some highly competitive, large data volume of enterprises, has emerged data mining applications. |
2005 |
Information provision has been unable to meet the requirements of many enterprises, especially banks, communications, securities and other competitive, risk-intensive industries, a large number of emerging data mining needs, BI applications finally formed information + knowledge of the whole. |
The challenges that BI tools face in China:
* Complex Sample: China is the world's most complex reporting country. China's sample design is different from the West, Western reports tend to use only one report to explain a problem, and China's report tends to focus on a single report as many problems as possible, which leads directly to the complex format of Chinese reports and strange style.
* Big Data Volume: China is the most populous country in the world. China Mobile, for example, only one province of China, the number of users, the equivalent of a European population of a medium-sized country, is a real huge amount of data! Foreign database, Data Warehouse and BI application software have been tested by large data capacity in China. For the United States, it may be possible for a customer analysis application to produce results in two seconds, but it is not a two-second problem in the amount of data in China.
* Data Write-back: China is the most exotic country in the world for BI systems. Originally BI system is to faithfully reproduce the source data as the principle, but this principle in China encountered a problem, many leaders have put forward the need for data modification, "the figures in the report is not good-looking, you will be able to change ah, and sometimes need to adjust Ah, so that senior leaders look at the good!" "said one leader. BI products that currently meet this requirement are only available in Microsoft and MicroStrategy two. Microsoft is thoroughly understand to the Chinese market.
Getting Started with BI Classic (reprint)