The basic architecture of the Data Warehouse

Source: Internet
Author: User

The purpose of the data Warehouse is to build an integrated data environment for analysis, providing decision support for the Enterprise (decision supports).

The basic architecture of data warehouse mainly consists of the process of data inflow and outflow, which can be divided into three layers-- source data, Data Warehouse, data application.


The Data warehouse obtains the data from each data source and in the Data warehouse the data transformation and the flow can consider is the ETL (extracts Extra, transforms Transfer, loads the load) the process, the ETL is the Data Warehouse pipeline, the Data Warehouse daily management and the maintenance work most energy is maintains the ETL the positive Constant and stable.


Each module in the Data Warehouse:

Data Source: Click Stream Log is a main data source, it is the basic data of the analysis of the website, of course, the database data of the website is not very small, it records the data of this website operation and the result of various user operation, for analysis website outcome This kind of data more accurate Others are documents that may be generated outside the site and other types of data that are useful for company decisions.

data storage: The source data is exported through the daily tasks of ETL, and is converted into the Data warehouse in the form of attributes. The data Warehouse does not need to store all the raw data, but the data warehouse needs to store the detail data, and the imported data must be collated and transformed to be subject-oriented.

Data aggregation: aggregated data here refers to a simple aggregation based on specific requirements (multidimensional data-based aggregation is reflected in the multidimensional data model), and simple aggregation can be aggregated data such as the total pageviews, Visits, Unique Visitors, etc. of the site, or it can be Avg. time on the page, Avg. time on site, and other average data that can be displayed directly on the report.

multidimensional Data Model: Multidimensional data model provides the sales star model and snowflake model based on time dimension, region dimension and so on, which can realize cross-querying and subdivision of each time dimension and geographical dimension. Therefore, the application of multidimensional data model is generally based on online analytical processing (online analyticalprocess, OLAP), and the data mart for a specific demand group is built on the basis of multidimensional data model.

business model : refers to the data model based on some data analysis and decision support, such as user evaluation Model, association recommendation model, RFM analysis model, or decision support linear programming model, inventory model, etc.


Data Warehousing Data Application:

Report presentation: reports are almost an essential type of data application for each data warehouse, presenting aggregated data and multidimensional analysis data to reports, providing the simplest and most intuitive data.

ad hoc queries: The table provides flexible data acquisition methods that allow users to query for data based on their needs and provide the ability to export to external files such as Excel.

Data Analysis : Based on the construction of the business model expansion, of course, can also use aggregated data for trend analysis, comparative analysis, analysis, and multidimensional data model provides a multidimensional analysis of the data base.

Data Mining: based on the business model that has been built in the Data warehouse, but most of the time the data mining is directly from the detail data, and the Data Warehouse provides the data interface for the mining tools such as SAS and SPSS.

Metadata management: Metadata (meta date), in fact, should be called explanatory data, i.e. data data.

It mainly records the definition of the model in the Data Warehouse, the mapping relation between each level, the data state of monitoring data Warehouse and the task running state of ETL. Metadata is typically stored and managed centrally through the metadata repository (Metadata Repository), whose primary purpose is to achieve synergy and consistency in the design, deployment, operation, and management of the Data Warehouse.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.