Data Mining Modeling Process

Source: Internet
Author: User

1. Defining a mining target

2. Data sampling

Question: Which data sources are available. /How to ensure the quality of sampled data. /Whether it is representative within a sufficient range. /data samples are appropriate. /How to categorize (training set, validation set, test set).

Metrics to measure data sampling: 1) data intact, all kinds of indicators complete 2) accurate information, reflected in the normal state of the level

Sampling methods: Random sampling, equidistant sampling, stratified sampling, sampling from the order of fact, classification sampling

3. Data exploration: The aim is to ensure the quality of data samples, thus laying the foundation for quality assurance.

Mainly include: Outlier analysis, missing value analysis, correlation analysis, periodic analyses, cross-validation of samples, etc.

4. Preprocessing

Mainly include: Data filtering, data variable transformation, missing value processing, bad data processing, standardization, principal component analysis, attribute selection, data specification

5. Pattern Discovery: Classification, Clustering, association rules, or time series mode.

6. Pattern Building: What algorithm is used. Implementation steps.

The construction of predictive model includes model establishment, model training, model validation, model prediction

7. Model Evaluation:

Thinking question: What is the purpose of the evaluation. How to evaluate the model effect. Measured by what evaluation indicators.

The prediction model effect is usually measured by relative absolute error, mean absolute error, root mean variance, relative square root error, feedback rate, ROC (subject work characteristic) curve.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.