What is Bi and OLAP?

Source: Internet
Author: User
Tags format definition ibm db2
   
(1) What do you want to do with so much data, boss?  
Suppose you are the boss of a retail company.

Your company is very advanced and has achieved Business Informatization. Every sales document is stored in the database. It has been accumulating over a period of time and has saved more than 10 years of sales data and hundreds of millions of sales records.

At this time, if I ask you: "The data stored three years ago also occupies a place in vain, which consumes storage costs. Simply delete them, so that new data can be stored without buying a hard disk. How can this problem be solved?"

Will you accept this suggestion with ease?

So what do you need to do with so much data, boss?

Yes, just like me, you have vaguely realized the value of data. This is why we cannot leave historical data alone, just like any modern enterprise, even like any traditional ticket number, old-fashioned stores faithfully store old data, because we have intuition and our intuition tells us that the data is useful!

But this is just a kind of intuition. How can we mine the value of the data that occupies a large amount of storage space and turn the data from the cost consumer to the profit promoter?

Some links seem missing in the middle.

 
(2) Business Intelligence-connecting data and decision makers  
Business Intelligence (BI) is a new technology that uses data warehousing, online analysis, data mining, and other technologies to process and analyze data. It aims to provide enterprise decision makersDecision support.

Let's shout three times:Decision support, decision support, and decision support!

Bi is a factory:

> The raw materials of Bi are massive.Data;

> Bi products are produced by data processing.InformationAndKnowledge;

> Bi pushes these products to EnterprisesDecision makers;

> Enterprise decision makers use Bi factory products to make the right decisionsDecision MakingTo promote the development of enterprises;

This is business intelligence, that is, business intelligence-connecting data with decision makers and changing dataValue.

Bi applications are classifiedInformation applicationsAndKnowledge applicationsThe features are shown in the following table:

Information Bi applications It refers toData Query,Report chart,Multidimensional Analysis,Data VisualizationThese applications share the common characteristics of converting data into acceptable information for decision makers and presenting it to decision makers.
For example, the bank transaction data is processed as a bank financial statement.
It is only responsible for providing information and does not take the initiative to analyze data.
For example, the bank's financial statement tools do not have the ability to thoroughly analyze the relationship between customer churn and bank interest rates. Instead, they can only rely on decision makers to combine information and draw knowledge through human thinking.
Knowledge Bi applications Indicates passData MiningTechnology and tools are used to discover hidden relationships in data and use computers to directly process data as knowledge and present it to decision makers. It actively explores data associations in data, explores hidden knowledge that the decision maker's human brain cannot quickly discover, and presents it in an understandable form to the decision maker.

 
(3) Bi basic application mode overview-Data Query (querying) Category: Information Application; Star :*____

Data Query is the simplest Bi application. It belongs to the heritage of MIS system. Although it is old-fashioned, it is still the most direct way for decision makers to obtain information.

Now, the data query interface has completely escaped the traditional SQL command line, A large number of drop-down menus, input boxes, list boxes, and other elements, or even a drag-and-drop interface, package the SQL statements in the background into a cool data acquisition system, in essence, it still does not leave the several major elements of Data Query:

> Query

> Where to query

> Filter conditions

> Display Method

Currently, the popular data query applications in foreign countries have completely released the flexibility of data query, as shown in the right figure in the Data Query studio of Cognos Reportnet, allowing users to access the pure browser interface, you can drag and drop the mouse to define data query elements and display data in multiple ways, such as reports and charts.

 
(4) Bi basic application mode overview-Reports) Category: Information Application; Star :**___
Reports are one of the most popular Bi applications in China, which is inseparable from the historical position of reports in Chinese enterprises and institutions. China's reports are well known for their distinctive formats, data concentration, and odd rules. They once put countless foreign reporting tools and Bi tools on their minds.

The two main elements of a report are:DataAndFormatIf no format is available, the report application is almost equivalent to the data query application. A report shows the queried data in the specified format.

A report application consists of two modules: Report presentation and report production. Report presentation allows the decision maker to view the report and allows the decision maker to select report data through conditional definitions, such as the year, department, and organization of the Report. report production is for report developers, its format definition flexibility, data ing flexibility, and the richness of computing methods all affect the quality of Bi report applications.

To clarify, Microsoft Excel is not a bi report tool, because Excel does not have the ability to connect to the data source. At best, it is a spread sheet. However, the powerful format function of Excel allows the report maker to fold. Even later, almost all bi vendors provided plug-ins for Microsoft Excel, excel can be connected to the Bi data source, changing to a bi report tool, and the ugly duckling becomes a swan.


It is a typical report with Chinese characteristics. Note that its horizontal header has different layers of nested columns. The line is called "unbalanced report ". The data in these report cells is obtained through different calculation methods from different fields in different database tables. This type of report killed the first batch of Bi software that grabbed the beach in China in that year, and finally produced many pre-sale countermeasures, for example, you can use the Bitmap header to calculate the sequence of tables in the database stored procedure, and use the program to call Excel com nested stitching.
 
(5) Bi advanced application mode overview-Online Analytical Processing, OLAP) Category: Information Application; Star :*****
OLAP, that isOnline Analytical ProcessingBi brings a brand new way of data observation and is one of the core technologies of Bi.

We know that data is stored in a database as a data table. For example, the sales data of a store is stored in a data table as follows:

Sales time Sales location Product Sales quantity Sales amount
2004-11-1 Beijing Soap 10 342.00
2004-11-6 Guangzhou Orange 30 123.00
2004-12-3 Beijing Bananas 20 12.00
2004-12-13 Shanghai Orange 50 189.00
2005-1-8 Beijing Soap 10 342.00
2005-1-23 Shanghai Toothbrush 30 150.00
Guangzhou Toothbrush 20 100.00

Policy makers often want to know macro information such as distribution, proportions, and trends, such as the following:

> Is the sales volume in Beijing time changing?

> Which product has the largest increase in sales in 2005 compared with that in 2004?

> What is the proportional distribution of product sales in 2004? ......

In the face of such requirements, a large number of sum operations must be performed using SQL statements. Each time a problem is obtained, SQL sum is required. In the face of the above seven records, we can easily produce results, but when we face millions or even hundreds of millions of records, such as mobile company call data, each SQL sum operation consumes a lot of time for calculation. The decision makers often propose analysis requirements on the first day and wait until the second day to obtain the calculation results. This analysis method is "offline analysis ", low efficiency.

To improve data analysis efficiency, OLAP completely breaks the record-based data browsing mode, and separates data into dimensions and measures )":

>DimensionIs to observe the data angle, such as the above example of "sales time", "sales location", "product ";

>MeasurementIs the specific quantity value, such as the "sales quantity" and "sales amount" in the previous example ";

In this way, we can convert the data list of the previous normal version into a cube with three dimensions ):

   

The process of data exploration is to determine a point in the cube, and then observe the measurement value of this point:

 

 

Of course, the data cube is not limited to three dimensions. Here we use three dimensions to illustrate the problem, just because the limit shown by the graph is three dimensions.

Dimensions can be divided into layers. For example, the time can be summarized from day to day as month and year, the product can be summarized as food and daily necessities, and the location can be summarized as North China and South China, you can drill down or roll up at any level of the dimension ):

 

In this way, we can get rid of the speed constraints of SQL sum, quickly locate detailed data that meets different conditions, and quickly obtain summary data at a certain level. OLAP technology provides a multi-angle, multi-level, and efficient data exploration method for decision makers. The thinking of decision makers is no longer limited by fixed drop-down menus and query conditions, instead, the decision maker's thinking leads the data acquisition and any combination of analysis perspectives and analysis objectives. This breaks the traditional Interactive Analysis and high efficiency and makes OLAP the core application of the Bi system.

It is an OLAP analysis interface provided by Cognos powerplay. You only need to drag the dimensions and measurements you are interested in to the corresponding positions to obtain charts and reports:

   

 
(*) Fourth spray: Bi advanced application model-data visualization and Data Mining  
(6) Bi Application Mode overview-Data Visualization) Category: Information Application; Star :****_

A picture is better than a thousand words.

Data visualization applications are committed to presenting information in as many forms as possible, with the aim of allowing decision makers to quickly obtain the knowledge contained in information through visual representations such as graphics, such as trend, distribution, and density.

It is worth mentioning that GIS software vendors represented by MapInfo are also working hard to integrate Bi applications. MapInfo first proposed the concept of location intelligence, which relies on the Geographic Information System to display the attribute values of various regions, such as population density, industrial output value, and the number of hospitals per capita, this visualization application partially overlaps with Bi data visualization applications and forms a powerful supplement. Sometimes it can be used together in a project.

The figure above shows the Cognos visualizer product. This guy presents data and information in a wide array of forms, including maps, pie charts, waterfall charts, and so on, it also provides two and three dimensional display modes. All graphic elements are movable. For example, you can click a province on the map to drill down information of each city in the province, this interoperability is a significant difference between Bi and common image generation software.

 
(7) Bi Application Mode overview-Data Mining) Category: knowledge applications; Star :*****
Data mining is the most advanced Bi application because it can replace some human brain functions.

Data Mining is a special case of knowledge discovery in structured data.

The purpose of data mining is to use computers to analyze a large amount of data, find hidden patterns and knowledge between data, and present it to users in an understandable way.

The three main elements of data mining are:

>Technologies and algorithms:Currently, common data mining technologies include --
Auto Cluster Detection)
Decision tree (demo-trees)
Neural Networks)

>Data:Because data mining is a process of mining unknown in known conditions,
Therefore, we need to accumulate a large amount of data as a data source.
The larger the volume, the data mining tool will have more reference points.

>Prediction Model:That is, the business logic for data mining is
Computer simulation is also the main task of data mining.

Compared with information Bi applications, the Knowledge Bi applications represented by data mining are not yet mature, but from another perspective, there is still much room for data mining to develop, it is the key direction of Bi development in the future. The image of Knowledge Bi application vendors such as SAS and SPSS is gradually growing, quietly occupying new profit growth points.


The famous IBM intelligent miner is analyzing the customer's consumption behavior. It can analyze a large amount of customer data, and then automatically divide the customer into several groups (automatic category Detection), and display the consumption characteristics of each group, in this way, the decision makers can clearly determine the consumption habits of different customers, and develop promotion plans or advertising plans.

If the above functions are implemented by information Bi applications alone, the decision makers need to perform a lot of OLAP analysis and data query based on experience, and may not be able to discover hidden rules in the data. For example, for a bank with 4 million users, if there is no data mining tool, people will be exhausted.

 
(8) Bi base-Data Warehouse) Category: Mother Earth; Star :_____

Before starting the topic, let's take a look at the official definition of the Data Warehouse:

Data Warehouse is a topic-oriented (subject oriented), integrated (integrate), relatively stable (non-volatile), and Time Variant) is used to support management decisions. The above is the official definition of the data warehouse.

"Operational database" is like a database of the bookkeeping system in a bank. Every business operation (for example, you have saved 5 yuan) will be recorded in this database immediately, all accumulated data is fragmented. This dirty and tiring database is called "operational database" and is oriented to business operations.

"Data Warehouse" is used for decision support and is oriented to analytical data processing, unlike operational databases. In addition, data warehouse effectively integrates multiple heterogeneous data sources. After integration, the Data Warehouse is reorganized according to the topic, and contains historical data, and the data stored in the data warehouse is generally not modified.

The relationship between operational databases, data warehouses, and databases is like the relationship between C:, D: And hard disk. The database is a hard disk, and the operational database is C :, data Warehouse is D: The operation-type database and data warehouse are stored in the database, but the design mode and usage of the table structure are different.

So why should we add such a "data warehouse" between operational databases and bi?

First, the operational database is busy day and night. It aims to respond quickly to the business and has no energy to meet the data requirements of Bi. In addition, the data requirements of Bi are usually summarized, A select sum (XX) group by XX can make the operation-type database consume a lot of resources, and the business processing can't keep up with it. The trouble is huge. For example, if you save 5000 yuan, what do you think if the money hasn't been paid in ten minutes? The Bank's leaders must be looking at the pie chart?

Second, enterprises generally have multiple applications that correspond to multiple operational databases, such as the human resource library, financial database, sales document library, and inventory database, to provide a panoramic data view, Bi must combine these scattered data. For example, to achieve OLAP analysis that integrates sales and inventory information, bi tools must be able to efficiently obtain data from two databases. The most efficient method is to integrate the data into the data warehouse first, while Bi applications retrieve data from the data warehouse in a unified manner.
Integrating data from distributed operational databases into a data warehouse is a university question that gave birth to the market of data integration software. This kind of integration does not simply overlay tables together, but must extract the dimensions of each operational database and set the common dimensions to the shared dimensions, then, the database tables containing specific metric values are unified into several large tables (the term "fact table" and fact tables) according to the topic, and the Data Warehouse table structure is established according to the dimension-measurement model, then extract and convert the data. Subsequent extraction is generally performed in incremental extraction of new data when the operational database load is relatively small (such as early morning), so that data in the data warehouse will accumulate.
Most Bi applications do not require real-time data. For example, decision makers only need to see the weekly report of last week every Monday. 95% of Bi applicationsNoTo achieve real-time performance, data may lag from one hour to one month. This is a feature of the Decision Support System. This lagging interval is the time when data extraction tools work. Of course, Bi applications usually have very few requirements on real-time data. In this case, you only need to directly connect the Bi querying software to the Business Database to address these special requirements, however, you must restrict the load and prohibit complex queries.

Currently, database products provide special optimizations for Data Warehouses. For example, when installing a high version of MySQL, you will be asked in sequence to make the database instance transaction-oriented, or demo-support. The former is an operational database, and the latter is a data warehouse (Decision supportAgain), for these two forms, the database will provide targeted optimization.

 
(9) Bi lace  

This is generally the case for bi-related knowledge. Write some lace as the final conclusion.

BI:Bi cannot process unstructured data and can only process digital information. However, in enterprises, there are still a large number of unstructured data such as text, streaming media, and images, these data also have a lot of value, but in the face of these data, the current bi tool is powerless. IBM intelligent miner for text is more reliable, but it seems very weak in Chinese processing.

Bi vendors and products:Let's get to know these big figures! Data warehouse: IBM DB2, Oracle, Sybase IQ, NCR teradata, etc. BI applications: Cognos, Business Objects, microstrategy, Hyperion, IBM, etc, such as IBM, SAS, and SPSS. Microsoft has also launched SQL Server Analysis Server, reporting services, and other bi-related products in the Bi field to seize the peak.

China's bi market development:

 
Time period Bi Application in China
Before January 1, 2002 A large number of Bi software is regarded as a report that can extract data from multiple data sources. It is full of reports.
At the beginning, the company introduced its sales to users when selling products: "We are the strongest in the Bi field ......" The effect was not good. Later, the sales finally found a tip and said, "We can do any report !" Then the order continues.
2002-2003 The value of OLAP has finally been discovered by some eyes. Some competitive companies urgently need to mine value from historical data to improve their competitiveness and quickly discover the advantages of OLAP, at this time, sales finally do not have to say "We can do any report. However, State authorities and state-owned enterprises still report, and think Bi is a report.
2004 With the implementation of more and more successful bi projects, OLAP has finally become a reasonable Bi application structure for data query, report presentation, and OLAP analysis in China. Some data visualization requirements are also frequently raised by users. In some highly competitive enterprises with large data volumes, data mining applications have emerged.
2005 Information provision has been unable to meet the requirements of many enterprises, especially in highly competitive and risky industries such as banks, communications and securities. A large number of data mining needs have emerged, the Bi application finally forms the overall information + knowledge.

Difficulties encountered by Bi tools in China:

* Complex table: China is the most complex country in the world. The design concept of a sample table in China is different from that in the West. Western reports tend to describe only one report, while Chinese reports tend to concentrate as many problems as possible in one report, this approach directly leads to the complicated format and strange style of Chinese reports.

 
  Western reports (built by ECLIPSE Birt Project)
 
  China report

* Large data volume: China is the most populous country in the world. Taking China Mobile as an example, the number of users in only one province in China is equivalent to the population of a medium-sized country in Europe and is a real massive data volume! Foreign databases, data warehouses, and Bi application software have all stood the test of large data capacity in China. For the United States, a customer may be able to analyze the application in two seconds, but the data volume in China is not a problem in two seconds.

* Data Write-back: China is the world's most exotic Bi system requirement. Originally, the Bi SYSTEM faithfully reproduced the source data, but this principle encountered a problem in China, and many leaders raised the data modification requirements. "If the numbers in the reports are not easy to read, we must be able to modify them, and sometimes it needs to be adjusted, so the superior can look at it! "Said a leader. Currently, only Microsoft and microstrategy bi products meet this requirement. Microsoft is thoroughly familiar with the Chinese market.

Born from Lei Feng, open-source Bi platform:With the contribution and support of actuate, a famous Bi manufacturer, eclipse started its Birt project around October 2004 and got involved in the Bi field. Although actuate is not well-known by other bi vendors, but Cognos, microstrategy, and other vendors in the market are actually using the kernel bought from actuate in key modules.

At present, the Birt project has released version 1.0, which provides some bi report and chart application modules, which is far from the commercial Bi platform, after all, the Birt project has only developed for less than six months, and it is still good to make such progress. We hope we can see that Birt can implement OLAP, data visualization, data mining, and other modules as soon as possible, to make the world better.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.