Data Mining Learning Guide <1>

Source: Internet
Author: User
Tags snowflake schema

Currently, popular technologies such as big data and cloud computing are widely used by domestic Internet giants such as Baidu and Alibaba. Data Mining is a very practical technology, it plays a major role in business management, market analysis, scientific computing, and other big data.


Data mining technology has become very popular, why?

1. As a means of marketing, data mining can capture potential business information. A commercial company can predict consumers' preferences and interests based on customers' previous consumption records, and conduct targeted marketing to promote profits of both parties. A typical diaper + beer combination won't be mentioned. For example, the banking system can predict whether a customer may buy a house or get married based on the customer's sudden large-scale consumption, and then introduce the business to the real estate and wedding.

2. Data Mining can provide knowledge for decision makers. There is a large amount of data and little knowledge. There is a huge amount of data stored in the database. How can we use this data to find potential patterns such as customer consumption and classification. This advantage has been reflected in many industries such as China Telecom, banking, and supermarkets. For example, a China Telecom company sent 10 years of national telephone data to research institutions to develop appropriate telephone charging schemes and management policies.

With the rise of big data in e-commerce, stock systems, and credit card transactions, data mining is discovering new knowledge to provide customized Customer Relationship Management (CRM ).


After briefly introducing several practical application cases of data mining, what is data mining?

Data Mining is used to discover useful, novel, and understandable models from massive data sets. Data Mining is implemented through multi-disciplinary theories and rules such as databases, machine learning (Bayesian classifier, decision tree, etc.), mathematical statistics, and neural networks.

To master data mining, you need to understand several models and database types of data mining.

1. Association Rules: Find the frequently-occurring attribute group or project group from the database. For example, beer and diapers, badminton and battledore.

2. classifier: Create a classifier from data training and input new data for classification. For example, a decision tree is used to evaluate a bank's credit grade based on the recorded customer's credit card transactions, loan and repayment data.

3. Clustering: grouping datasets to ensure high similarity between elements in a group and no similarity between groups. E-commerce, for example, determines the classification of customers by browsing similar products by customers and species by determining biological features.

4. sequential mining: Find subsequences with high frequencies Based on Multiple sequences. For example, if a seller sells you a computer, a printer or router may be recommended to you in nine months.

5. exception detection: N points are given, and the value of K at a certain point exceeds the range. K points are abnormal.

The premise of Data Mining is big data, which discovers models and knowledge from massive data. Therefore, the basis for model establishment must be based on data. Various types of data bring development space and challenges for data mining. This section describes several common data types in Data Mining:

1. Linked List in relational databases. Aside from the question, the relational data management system can simply provide data queries, but it cannot bring more knowledge.

2. Data Warehouse. Data Warehouses clean up and integrate data in databases to provide source data for data mining models.

3. spatial data. For example, MAP information collected by remote sensing satellites, PCB design and detection of integrated circuits, etc.

4. graphs, multimedia, text databases, etc.

Although data mining is mature, it is also necessary to understand the latest development direction, challenges, and improvements of this technology.

1. Data Mining High-performance, high-portability algorithm discovery. Does the use of classic algorithms remain unchanged for decades?

2. Improved interaction with users. Database technology has a special data query language SQL. CAN Data Mining develop a language?

3. Visualization of data mining results.

For more discussions and research on professional technologies, refer to international conferences and journals on data mining, such as ieee icdm, pkdd, ACM Data Mining and knowledge disdge.


Data warehouse and OLAP Technology

A data warehouse is a processing object for data mining. During data analysis, data needs to be merged from a massive database and then integrated into a data warehouse, then, use mathematical analysis and modeling to analyze the data, and then obtain the knowledge applied to decision analysis. Therefore, the data warehouse has the characteristics of integration and topic-oriented, which is different from the transaction-oriented database. The database considers transaction stream processing and creates a table. Each attribute represents the specific meaning of solving things, A data warehouse is a database that integrates different sources. Through model analysis, you can find a rule or category with some internal connection.

OLTP and OLAP are both online processing based on DBMS and DM. OLTP processes transactions in real time, such as customer registration, Book Registration, and product shelving; OLAP uses a model to process datasets in the past.

The Difference and connection between so many data warehouses and databases have been discussed earlier. How can we establish a data warehouse? The following describes how to create a data cube from a table and a workbook ).

Data Cube is a multidimensional data model in a data warehouse for convenient statistics and analysis. Different dimensions represent different items. Roll up and drill down can be used to accumulate data in a dimension.

The concept model of Data Warehouse (Conceptual modeling) mainly includes three types: Star Schema, snowflake schema, and fact constellation, that is, star model, snowflake model, and constellation model, the creation of these models is just like the external manifestation of each other. The star model is center-divergent, the snowflake model is terminal-divergent, And the constellation model is in the state of multiple snowflakes. It is important that the Data Warehouse not only contains items of each dimension in the schema, but also contains operations on these dimensions (Measures ).

Data preprocessing is an important part of building a data warehouse, including data cleaning, data transformation, and data dimensionality reduction.

Data cleanup mainly refers to the work of undefined data, missing data, and selecting association of certain attributes; Data Transformation mainly refers to standardized data and normalization; Data dimensionality reduction, data related to certain dimensions can be deleted.

Data Mining Learning Guide <1>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.