Bi entry classic

Source: Internet
Author: User
Tags format definition ibm db2

(1) What do you want to do with so much data, boss?
Suppose you are the boss of a retail company.
Your company is very advanced and has achieved Business Informatization. Every sales document is stored in the database. It has been accumulating over a period of time and has saved more than 10 years of sales data and hundreds of millions of sales records.
At this time, if I ask you: "The data stored three years ago also occupies a place in vain, which consumes storage costs. Simply delete them, so that new data can be stored without buying a hard disk. How can this problem be solved?"
Will you accept this suggestion with ease?
So what do you need to do with so much data, boss?
Yes, just like me, you have vaguely realized the value of data. This is why we cannot leave historical data alone, just like any modern enterprise, even like any traditional ticket number, old-fashioned stores faithfully store old data, because we have intuition and our intuition tells us that the data is useful!
But this is just a kind of intuition. How can we mine the value of the data that occupies a large amount of storage space and turn the data from the cost consumer to the profit promoter?
Some links seem missing in the middle.
(2) Business Intelligence-connecting data and decision makers
Business Intelligence (BI) is a new technology that uses data warehousing, online analysis, data mining, and other technologies to process and analyze data. It aims to provide decision-making support for enterprise decision makers.
Let's shout three times: Decision Support, decision support, and decision support!
Bi is a factory:
> The raw materials of Bi are massive data;
> Bi products are information and knowledge from data processing;
> Bi pushes these products to enterprise decision makers;
> Enterprise decision makers make correct decisions using bi factory products to promote enterprise development;
This is business intelligence, that is, business intelligence, which connects data with decision makers and changes data to value.
Bi applications are classified into information applications and knowledge applications. Their features are shown in the following table:

Information Bi applications

Data Query, report charts, multidimensional analysis, and data visualization applications processed from raw data. These applications share the following characteristics: converting data into acceptable information for decision makers, presented to decision makers.
For example, the bank transaction data is processed as a bank financial statement.

It is only responsible for providing information and does not take the initiative to analyze data.
For example, the bank's financial statement tools do not have the ability to thoroughly analyze the relationship between customer churn and bank interest rates. Instead, they can only rely on decision makers to combine information and draw knowledge through human thinking.

Knowledge Bi applications
It refers to discovering hidden relationships in data through data mining technologies and tools, and using computers to directly process data as knowledge and present it to decision makers.

It actively explores data associations in data, explores hidden knowledge that the decision maker's human brain cannot quickly discover, and presents it in an understandable form to the decision maker.

(3) Bi basic application mode overview-Data Query (querying)
Data Query is the simplest Bi application. It belongs to the heritage of MIS system. Although it is old-fashioned, it is still the most direct way for decision makers to obtain information.
Now, the data query interface has completely escaped the traditional SQL command line, A large number of drop-down menus, input boxes, list boxes, and other elements, or even a drag-and-drop interface, package the SQL statements in the background into a cool data acquisition system, in essence, it still does not leave the several major elements of Data Query:
> Query
> Where to query
> Filter conditions
> Display Method
Currently, the popular data query applications in foreign countries have completely released the flexibility of data query, as shown in the right figure in the Data Query studio of Cognos Reportnet, allowing users to access the pure browser interface, you can drag and drop the mouse to define data query elements and display data in multiple ways, such as reports and charts.

(4) Bi basic application mode overview-Reports)
Reports are one of the most popular Bi applications in China, which is inseparable from the historical position of reports in Chinese enterprises and institutions. China's reports are well known for their distinctive formats, data concentration, and odd rules. They once put countless foreign reporting tools and Bi tools on their minds.
The two main elements of a report are data and format. If there is no format, the report application is almost equivalent to the data query application. A report shows the queried data in the specified format.
A report application consists of two modules: Report presentation and report production. Report presentation allows the decision maker to view the report and allows the decision maker to select report data through conditional definitions, such as the year, department, and organization of the Report. report production is for report developers, its format definition flexibility, data ing flexibility, and the richness of computing methods all affect the quality of Bi report applications.
To clarify, Microsoft Excel is not a bi report tool, because Excel does not have the ability to connect to the data source. At best, it is a spread sheet. However, the powerful format function of Excel allows the report maker to fold. Even later, almost all bi vendors provided plug-ins for Microsoft Excel, excel can be connected to the Bi data source, changing to a bi report tool, and the ugly duckling becomes a swan.


5) Bi advanced application mode overview-Online Analytical Processing, OLAP)

OLAP, that is, Online Analytical Processing, is a brand new data observation method brought about by Bi and is one of the core technologies of Bi.
We know that data is stored in a database as a data table. For example, the sales data of a store is stored in a data table as follows:

Sales time Sales location Product Sales quantity Sales amount
2003-11-1 Beijing Orange 10 342.00
2003-12-1 Guangzhou Bananas 100 222.00
2004-1-1 Beijing Soap 20 52.00
2004-3-1 Guangzhou Bananas 35 77.00
2004-3-7 Beijing Soap 20 8.00
2004-6-10 Guangzhou Orange 10 16.00

Policy makers often want to know macro information such as distribution, proportions, and trends, such as the following:

> Is the sales volume in Beijing time changing?

> Which product has the largest increase in sales in 2005 compared with that in 2004?

> What is the proportional distribution of product sales in 2004? ......

In the face of such requirements, a large number of sum operations must be performed using SQL statements. Each time a problem is obtained, SQL sum is required. In the face of the above seven records, we can easily produce results, but when we face millions or even hundreds of millions of records, such as mobile company call data, each SQL sum operation consumes a lot of time for calculation. The decision makers often propose analysis requirements on the first day and wait until the second day to obtain the calculation results. This analysis method is "offline analysis ", low efficiency.

To improve data analysis efficiency, OLAP completely breaks the record-based data browsing mode, and separates data into dimensions and measures )":

> Dimensions are the data observation angles, such as "sales time", "sales location", and "product" in the preceding example ";

> A measurement is a specific quantity value, for example, "sales quantity" and "sales amount" in the previous example ";

In this way, we can convert the data list of the previous normal version into a cube with three dimensions ):

The process of data exploration is to determine a point in the cube, and then observe the measurement value of this point:

Of course, the data cube is not limited to three dimensions. Here we use three dimensions to illustrate the problem, just because the limit shown by the graph is three dimensions.

Dimensions can be divided into layers. For example, the time can be summarized from day to day as month and year, the product can be summarized as food and daily necessities, and the location can be summarized as North China and South China, you can drill down or roll up at any level of the dimension ):

In this way, we can get rid of the speed constraints of SQL sum, quickly locate detailed data that meets different conditions, and quickly obtain summary data at a certain level. OLAP technology provides a multi-angle, multi-level, and efficient data exploration method for decision makers. The thinking of decision makers is no longer limited by fixed drop-down menus and query conditions, instead, the decision maker's thinking leads the data acquisition and any combination of analysis perspectives and analysis objectives. This breaks the traditional Interactive Analysis and high efficiency and makes OLAP the core application of the Bi system.

(*) Fourth spray: Bi advanced application model-data visualization and Data Mining


(6) Bi Application Mode overview-Data Visualization)

Data visualization applications are committed to presenting information in as many forms as possible, with the aim of allowing decision makers to quickly obtain the knowledge contained in information through visual representations such as graphics, such as trend, distribution, and density. It is worth mentioning that GIS software vendors represented by MapInfo are also working hard to integrate Bi applications. MapInfo first proposed the concept of location intelligence, which relies on the Geographic Information System to display the attribute values of various regions, such as population density, industrial output value, and the number of hospitals per capita, this visualization application partially overlaps with Bi data visualization applications and forms a powerful supplement. Sometimes it can be used together in a project.

The Cognos visualizer product is shown. This guy presents data and information in a wide array of forms, including maps, pie charts, waterfall charts, and so on, it also provides two and three dimensional display modes. All graphic elements are movable. For example, you can click a province on the map to drill down information of each city in the province, this interoperability is a significant difference between Bi and common image generation software.

(7) Bi Application Mode overview-Data Mining)
Data mining is the most advanced Bi application because it can replace some human brain functions.
Data Mining is a special case of knowledge discovery in structured data.
The purpose of data mining is to use computers to analyze a large amount of data, find hidden patterns and knowledge between data, and present it to users in an understandable way.
The three main elements of data mining are:
> Technology and algorithm: currently, common data mining technologies include --
Auto Cluster Detection)
Decision tree (demo-trees)
Neural Networks)
> Data: because data mining is a process of mining unknown data in known conditions,
Therefore, we need to accumulate a large amount of data as a data source.
The larger the volume, the data mining tool will have more reference points.
> Prediction Model: The business logic for Data Mining
Computer simulation is also the main task of data mining.
Compared with information Bi applications, the Knowledge Bi applications represented by data mining are not yet mature, but from another perspective, there is still much room for data mining to develop, it is the key direction of Bi development in the future. The image of Knowledge Bi application vendors such as SAS and SPSS is gradually growing, quietly occupying new profit growth points.

The famous IBM intelligent miner is analyzing the customer's consumption behavior. It can analyze a large amount of customer data, and then automatically divide the customer into several groups (automatic category Detection), and display the consumption characteristics of each group, in this way, the decision makers can clearly determine the consumption habits of different customers, and develop promotion plans or advertising plans.

If the above functions are implemented by information Bi applications alone, the decision makers need to perform a lot of OLAP analysis and data query based on experience, and may not be able to discover hidden rules in the data. For example, for a bank with 4 million users, if there is no data mining tool, people will be exhausted.

(8) Bi base-Data Warehouse)
Before starting the topic, let's take a look at the official definition of the Data Warehouse:
Data Warehouse is a topic-oriented (subject oriented), integrated (integrate), relatively stable (non-volatile), and Time Variant) is used to support management decisions. The above is the official definition of the data warehouse.
"Operational database" is like a database of the bookkeeping system in a bank. Every business operation (for example, you have saved 5 yuan) will be recorded in this database immediately, all accumulated data is fragmented. This dirty and tiring database is called "operational database" and is oriented to business operations.
"Data Warehouse" is used for decision support and is oriented to analytical data processing, unlike operational databases. In addition, data warehouse effectively integrates multiple heterogeneous data sources. After integration, the Data Warehouse is reorganized according to the topic, and contains historical data, and the data stored in the data warehouse is generally not modified.
The relationship between operational databases, data warehouses, and databases is like the relationship between C:, D: And hard disk. The database is a hard disk, and the operational database is C :, data Warehouse is D: The operation-type database and data warehouse are stored in the database, but the design mode and usage of the table structure are different.

So why should we add such a "data warehouse" between operational databases and bi?

First, the operational database is busy day and night. It aims to respond quickly to the business and has no energy to meet the data requirements of Bi. In addition, the data requirements of Bi are usually summarized, A select sum (XX) group by XX can make the operation-type database consume a lot of resources, and the business processing can't keep up with it. The trouble is huge. For example, if you save 5000 yuan, what do you think if the money hasn't been paid in ten minutes? The Bank's leaders must be looking at the pie chart?

Second, enterprises generally have multiple applications that correspond to multiple operational databases, such as the human resource library, financial database, sales document library, and inventory database, to provide a panoramic data view, Bi must combine these scattered data. For example, to achieve OLAP analysis that integrates sales and inventory information, bi tools must be able to efficiently obtain data from two databases. The most efficient method is to integrate the data into the data warehouse first, while Bi applications retrieve data from the data warehouse in a unified manner.

Integrating data from distributed operational databases into a data warehouse is a university question that gave birth to the market of data integration software. This kind of integration does not simply overlay tables together, but must extract the dimensions of each operational database and set the common dimensions to the shared dimensions, then, the database tables containing specific metric values are unified into several large tables (the term "fact table" and fact tables) according to the topic, and the Data Warehouse table structure is established according to the dimension-measurement model, then extract and convert the data. Subsequent extraction is generally performed to incrementally extract new data when the operational database load is relatively small (such as in the early morning), so that data in the data warehouse will accumulate.

Most Bi applications do not require real-time data. For example, decision makers only need to see the weekly report of last week every Monday. 95% of Bi applications do not require real-time performance, data may lag between one hour and one month. This is a feature of the Decision Support System. This lagging interval is the time when data extraction tools work. Of course, Bi applications usually have very few requirements on real-time data. In this case, you only need to directly connect the Bi querying software to the Business Database to address these special requirements, however, you must restrict the load and prohibit complex queries.

Currently, database products provide special optimizations for Data Warehouses. For example, when installing a high version of MySQL, you will be asked in sequence to make the database instance transaction-oriented, or demo-support. The former is an operational database, and the latter is a data warehouse (decision support?). In these two forms, the database will provide targeted optimization.

(9) Bi lace
This is generally the case for bi-related knowledge. Write some lace as the final conclusion.
Key to bi: Bi cannot process unstructured data, but can only process digital information. However, in enterprises, there are still a large number of unstructured data such as text, streaming media, and images, these data also have a lot of value, but in the face of these data, the current bi tool is powerless. IBM intelligent miner for text is more reliable, but it seems very weak in Chinese processing.
Bi vendors and products:

First, let's get to know big foreign people! Data warehouse: IBM DB2, Oracle, Sybase IQ, NCR teradata, etc. BI applications: Cognos, Business Objects, microstrategy, Hyperion, IBM, etc, such as IBM, SAS, and SPSS. Microsoft has also been involved in Bi and launched SQL Server Analysis Server, reporting services, and other bi-related products!

We tend to focus only on overseas Bi leaders, while ignoring the emerging Bi new army in China. Today, the most famous Bi in China is the power-Bi Powered by AO Wei, shang Nan's bluequery and rundry reports, among others, are particularly worth mentioning that power-Bi is a standardized Bi with a certain market share in China.
China's bi market development:

Time period Bi Application in China
Before January 1, 2002 A large number of Bi software is regarded as a report that can extract data from multiple data sources. It is full of reports.
At the beginning, the company introduced its sales to users when selling products: "We are the strongest in the Bi field ......" The effect was not good. Later, the sales finally found a tip and said, "We can do any report !" Then the order continues.
2002-2003 The value of OLAP has finally been discovered by some eyes. Some competitive companies urgently need to mine value from historical data to improve their competitiveness and quickly discover the advantages of OLAP, at this time, sales finally do not have to say "We can do any report. However, State authorities and state-owned enterprises still report, and think Bi is a report.
2004

With the implementation of more and more successful bi projects, OLAP has finally become a reasonable Bi application structure for data query, report presentation, and OLAP analysis in China. Some data visualization requirements are also frequently raised by users. In some highly competitive enterprises with large data volumes, data mining applications have emerged.

2005

Information provision has been unable to meet the requirements of many enterprises, especially in highly competitive and risky industries such as banks, communications and securities. A large number of data mining needs have emerged, the Bi application finally forms the overall information + knowledge.

Difficulties encountered by Bi tools in China:

* Complex table: China is the most complex country in the world. The design concept of a sample table in China is different from that in the West. Western reports tend to describe only one report, while Chinese reports tend to concentrate as many problems as possible in one report, this approach directly leads to the complicated format and strange style of Chinese reports.

* Large data volume: China is the most populous country in the world. Taking China Mobile as an example, the number of users in only one province in China is equivalent to the population of a medium-sized country in Europe and is a real massive data volume! Foreign databases, data warehouses, and Bi application software have all stood the test of large data capacity in China. For the United States, a customer may be able to analyze the application in two seconds, but the data volume in China is not a problem in two seconds.

* Data Write-back: China is the world's most exotic Bi system requirement. Originally, the Bi SYSTEM faithfully reproduced the source data, but this principle encountered a problem in China, and many leaders raised the data modification requirements. "If the numbers in the reports are not easy to read, we must be able to modify them, and sometimes it needs to be adjusted, so the superior can look at it! "Said a leader. Currently, only Microsoft and microstrategy bi products meet this requirement. Microsoft is thoroughly familiar with the Chinese market.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.