1. Industry Data Mining methodology
2, in the work, we carry out the guidance method of data mining implementation:
Eight-Step application modeling: Business understanding, indicator design, data extraction, data exploration, algorithm selection, model evaluation, model release, model optimization
Step One: Business understanding
Common misunderstanding: Many people think that there is no need to identify problems and goals beforehand, as long as the data using data mining technology, and then the analysis of the results of mining search and interpretation, will naturally find some of the previously we do not know, useful laws and knowledge.
Process: Business Research, problem positioning--Business analysis
Step Two: Indicator design
Based on the analysis of business problems, find the appropriate analysis method or methodology to guide the design of model indicators, to ensure that the indicators are systematic and comprehensive.
Some common methods of analysis
Step Three: Extract data
Data extraction ensures the integrity, availability, and integrity of modeling data.
Data extraction: Extracting the data needed for modeling
Data cleansing: Missing processing Extreme Value data processing error data processing redundancy data processing
Data Audit: Data statistics error audit data source error audit data statistics caliber audit
Data integration: Building wide tables of data mining
Step Four: Data exploration
The data exploration mainly involves two work: first, carries on the data examination, the analysis, verifies whether conforms to the target design original intention and the business meaning; second, according to the modeling needs to do some standardization of data processing, so that different indicators on the same dimension of mathematical operations.
Step five: Algorithm selection
According to the modeling scenario, the algorithm chooses: such as: Description class has classification rules, cluster analysis, prediction class has, neural network, decision tree, time series, regression analysis, association analysis, Bayesian network, deviation detection, evaluation class has factor analysis, principal component analysis, mathematical formula, and combined with data conditions (such as discrete values, continuous values, Data size), and select the appropriate algorithm.
Step Six: Model evaluation
Step Seven: Model release
Focus on business issues to provide end-to-end thematic solutions, improve the effectiveness and value of data mining applications, is a set of end-to-end, complete data mining solutions, rather than pure data mining results
Step eight: Model optimization
Model Initial Construction: Model verification
Model Rise period: Model optimization based on model validation and business conditions
Model maturity: Model accuracy achieves corresponding precision, stable and mature leading business development
Model recession: The development model that accompanies the business no longer applies to the new business environment and gradually stops.
Data mining methodology and implementation steps