1. Define the mining target
To understand the real needs of users, to determine the target of data mining, and to achieve the desired results after the establishment of the model, by understanding the relevant industry field, familiar with the background knowledge. 2. Data acquisition and processing of clear mining objectives, the need to extract from the business data system and mining purposes related to the sample data subset.
Three criteria for sampling:
1. Relevance
2. Reliability
3. Effectiveness
In order to ensure the integrity and accuracy of the data as much as possible, we need to explore and preprocess the sampled data.
The research includes: Anomaly value analysis, missing value analysis, correlation analysis and periodicity analysis, etc.
Pretreatment includes: Exception value processing, missing value processing, dimensionality reduction processing, data variable conversion, data standardization, principal component Analysis 3. Mining modeling
1. It is clear that this modeling is a kind of problem in data mining application, such as: Classification, Clustering, association rules, timing pattern or intelligent recommendation, etc.
2. Select the appropriate algorithm for model building 4. Model evaluation
Choose the best model from a series of analysis results derived from the modeling process. The evaluation methods of different model algorithms are different. 5. Model Release