1. Defining a mining target
2. Data sampling
Question: Which data sources are available. /How to ensure the quality of sampled data. /Whether it is representative within a sufficient range. /data samples are appropriate. /How to categorize (training set, validation set, test set).
Metrics to measure data sampling: 1) data intact, all kinds of indicators complete 2) accurate information, reflected in the normal state of the level
Sampling methods: Random sampling, equidistant sampling, stratified sampling, sampling from the order of fact, classification sampling
3. Data exploration: The aim is to ensure the quality of data samples, thus laying the foundation for quality assurance.
Mainly include: Outlier analysis, missing value analysis, correlation analysis, periodic analyses, cross-validation of samples, etc.
4. Preprocessing
Mainly include: Data filtering, data variable transformation, missing value processing, bad data processing, standardization, principal component analysis, attribute selection, data specification
5. Pattern Discovery: Classification, Clustering, association rules, or time series mode.
6. Pattern Building: What algorithm is used. Implementation steps.
The construction of predictive model includes model establishment, model training, model validation, model prediction
7. Model Evaluation:
Thinking question: What is the purpose of the evaluation. How to evaluate the model effect. Measured by what evaluation indicators.
The prediction model effect is usually measured by relative absolute error, mean absolute error, root mean variance, relative square root error, feedback rate, ROC (subject work characteristic) curve.