BI basic architecture and ETL personal Understanding (ZZ)

Source: Internet
Author: User
Tags snowflake schema

Business Intelligence (BI) = Data Warehouse (DW) + OLAP + Data Mining (DM)
Business Intelligence = Data Warehouse + Online Analytics + data Mining
The purpose of BI is to help users make decision analysis, analyze the present situation from multidimensional perspective, provide reliable data base and background for decision-makers to make correct decision, and make correct guidance for the development of enterprises. However, in the domestic do bi really go into a misunderstanding, usually customers take bi as a reporting system, which is a little bit of a small use of the feeling, there is a different level of companies, often some companies holding a non-bi system to deceive customers to the BI cast a bad impression, In general, the development of BI in the past two years is still relatively smooth, there are more and more enterprises and institutions to start their own bi system, such as banking, taxation, insurance and other industries.

The usual architecture or basic architecture of BI is:
The source data->ods->dw->olap-> front-end.
Common source data types: relational databases, text data, and so on.
ODS: Operational Data Storage (Operation) main purpose is to integrate data from multiple data sources into a temporary buffer for use by the Data Warehouse. In general, ODS data will not be kept for a long period of 1 months or 3 months, and if the customer has a request for information then ODS may need to be retained, usually without
Report One advantage of ODS is that a buffer between the data warehouse and the source data is used to reduce the pressure on the source system, and we are using the user source system to operate. For example, when we load the fact table data from the source data to the data warehouse, we need to do the aggregation operation, and if there is no ODS layer, then all the pressure of the aggregation operation is done on the source system, which
Will put a lot of pressure on the customer's source system, which is a problem that is often encountered during the implementation of the project.
DW: Data Warehouse (Warehouse) is simply a database that stores fact and dimension table data.
Definition: Data warehousing (Warehouse) is a theme-oriented (Subject oriented), integrated (integrate), relatively stable (non-volatile), a collection of data reflecting historical changes (time Variant), Used to support management decisions.
Data warehouses typically use industry-leading relational databases, such as Oracle, DB2, SQL Server, and so on.
Dimension table: A table that stores data attributes in a fact table that stores the perspective of user profiling data, which provides OLAP with a data base for rotation and slicing.
Fact table: Storing historical data that passes through a certain aggregation is the center of a star or snowflake schema. Each data warehouse contains one or more fact tables.
The fact table includes both the index and the data, and the index is the foreign key of the dimension table that describes the data characteristics of the fact table, which is the data to be stored in the fact table, which is the source of the measure we usually say.

OLAP: Online Analytical processing (On-line analytical process) tools have essbase,microsoft analysis and so on.
The basic idea of OLAP is to make the enterprise decision-makers should be able to manipulate the data of the enterprise flexibly, observe the state of the enterprise from many aspects and multi-angle, and understand the change of the enterprise in multi-dimensional form. Using OLAP tools, we can make a connection between the dimension table and the fact table, and then do the aggregation operation to save the cube to achieve the objective of multi-angle analysis.
Front-end display tools: Front-End display tool is to assist users to multi-angle, custom display report form of tools, is a good OLAP tool, usually OLAP tools can only do simple data display, drilling, drilling and so on. The front end display tool can display the graph, the handle chart and so on according to the user demand, through the demonstration tool we may do some personalization
Set, permissions control and so on, commonly used tools Bo,brio,cognos,bi office, it is worth mentioning that bi Office is a domestic bi company's products, can be the domestic front-end display tool representative.

ETL Discussion:

The most difficult part of the BI development process is the ETL, which is almost 40% of the total work of the ETL system.
ETL Common tools: Data Stage, Informatic, Microsoft DTS and so on.

Do ETL Work principle:
1, to have a good understanding of the source data, this requires the business system engineers to cooperate. Not only understand the meaning of the source system tables, fields, but also the quality of the data validation.
2, with the customer to confirm the handling of dirty data (discard or default), which will directly affect the final report error rate.
3, to ensure that the data storage time is long, only to understand the length of data storage, you can better carry out the way the fact table stored (such as zoning methods, etc.)
4, timely verification of the accuracy of data, when we have done a certain historical data extraction to timely verify the accuracy of data with customers, or other systems online after the discovery of incorrect data, at this time too late.
5, to determine the scheduling, different scheduling will affect the completion of the data extraction time, such as 1 weeks of data scheduling in 1 days scheduled to be divided into 7 scheduling response time is completely different, which should be determined according to the application.
6, process monitoring and fault handling, this is essential, we monitor the ETL allow, there are any procedures can not guarantee never error, so we need to do to ensure that the failure to make up after the occurrence.

The above is in doing bi process some experience, hope to be helpful to everybody, what is wrong place please everybody point

Out and hope to communicate with you a lot.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.