"Reading notes-data mining concepts and technologies" data warehousing and online analytical processing (OLAP)

Source: Internet
Author: User

Before we saw the data and the preprocessing of the data, where was the data after processing? Put it in a place called "Data Warehouse".

Basic concepts of data warehousing:

    • Definition of Data Warehouse-topic-oriented, integrated, time-varying, non-volatile
    • Operational database Systems VS data Warehouses-Why use Data Warehouse analytics data (OLAP vs OLTP)
    • Data Warehouse Architecture-three-tier architecture: Bottom (Data Warehouse server)-middle tier (OLAP server)-top level (previous tool)
    • Three types of Data warehouse models
    1. Enterprise model
    2. Data mart (for a certain department only)
    3. Virtual Warehouse
    • Meta-database--Data about data

Understanding of OLAP, data warehousing, data mining links and differences, reference: Http://hi.baidu.com/hhhqpfnybgbfqrd/item/784f2d14b46c3106b98a1a83

http://blog.csdn.net/cuipower/article/details/342070

————————————————————————————————————————————————————————————————————————————

Data Warehouse Modeling: Data cube and OLAP

Reference: http://www.ibm.com/developerworks/cn/data/library/techarticles/dm-0803zhousb/

    • Data cube-Multidimensional data model
    • Star, Snowflake and fact constellations-patterns of multidimensional data models
    • Dimension-the role of conceptual stratification
    • Classification and calculation of measures

Classification--based on the aggregation function it uses

    1. Distribution of
    2. Algebra of
    3. of the whole
    • Typical OLAP operations-roll up, drill down, slice and dice, pivot, etc.
    • Querying a satellite query model for multidimensional databases

————————————————————————————————————————————————————————————————————————————

Implementation of Data Warehouse

    • Effective computation of data cube--Data warehouse contains huge amount of data, query is fast, so use efficient data cube technology

What is a data cube?

The data cube is a multidimensional matrix that allows users to explore and analyze datasets from multiple angles, usually at the same time taking into account three factors (dimensions).
When we try to extract information from a bunch of data, we need tools to help us find relevant and important information and explore different scenarios. A report, whether printed on paper or on the screen, is a two-dimensional representation of the data, a table of rows and columns. This is sufficient when we have only two factors to consider, but in the real world we need stronger tools.
A data cube is a multidimensional extension of a two-dimensional table, as in geometry a cube is a three-dimensional extension of a square. The word "cube" reminds us of three-dimensional objects, and we can think of three-dimensional data cubes as a similar set of two-dimensional tables stacked together.
But the data cube is not limited to three dimensions. Most online analytical processing (OLAP) systems can build data cubes in a number of dimensions, for example, Microsoft's SQL Server Analysis Services tool allows up to 64 dimensions (although it is still a problem to imagine a higher dimensional entity in space or geometric categories).
In practice, we often build data cubes with many dimensions, but we tend to look at only three dimensions at a time. The data cube is valuable because we can index cubes on one or more dimensions.

Operation:

∵ disaster: Too much concept layering, no place to store

∴ part of the body--partial materialization

    • The index is also divided into: Bitmap index and connection index
    • Effective processing of OLAP queries: fine--"coarse"
    • OLAP server-provides business users with multidimensional data for a data warehouse or data mart without having to worry about how the data is stored and stored.

————————————————————————————————————————————————————————————————————————————

Data generalization: attribute-oriented induction

What does data generalization mean?

Data generalization is an analytical process that abstracts from a relatively low-level concept to a higher-layer concept and provides an abstract overview of the large number of task-related data in a database. There are two main ways to effectively and flexibly summarize large amounts of data: (1) Data cube method and (2) attribute-based inductive method.

Data cube methods: Materialized views based on data, usually pre-computed in the Data Warehouse

Attribute-oriented generalization: query-oriented, generalization-based, online data analysis and processing technology

Note: There is no inherent boundary between the two

∵ Data cube technology is not enough to complete the concept description task for all large datasets

attribute-oriented induction of data features derived from ∴

Concept Description-Describes a given task-related dataset in a concise summary form, providing interesting general properties of the data. Made up of features and comparisons.

"Reading notes-data mining concepts and technologies" data warehousing and online analytical processing (OLAP)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.