Understanding of data mining and project flow

Source: Internet
Author: User

14 Graduation, that will enter the current company, do the very prosperous data mining at that time. In some people's eyes we are very mysterious, feel the research is very high-end, in some people's eyes is a handyman, where to go, and some people decide that we will be blowing water.

The real situation is to have a data mining project when the project, no project when the training, do system requirements analysis and product design. Really is a look high-end, actually chores and water-blowing live ~

4 years time, most of the time in fact, they are very floating, first floating in the feeling that they do things are relatively high-end, casually say a word is that the development has not heard, said the concept is relatively new. With the decline in the heat of the word data mining, the three major events of my life (marriage, buying a house, living a baby), I began to look back, position now, looking to the future, just began to think about what I am doing in the industry? Where is the future? What do I need to do now? In the past few years, the field of career experience is too much knowledge, life experience is rich enough, what things to know, but what is not enough to understand, the brain of the system of the industry, the workplace and life are urgent need to be combed through the text. This will be the 5th year in the workplace, combing out the next 5 years of planning.

What is data mining in the first place? The tasks of data mining, the problems to be solved and the process of data mining. This article is mostly book or business theory, but all through my industry verification, but also i a word a word of the dozen out. An incomparable recognition of something.

What is data mining: Finding useful information that is not found in massive amounts of data
Data Mining tasks: Classification, prediction, correlation, clustering
Data mining needs to be addressed: massive, high-dimensional, scalable, multi-type data: Heterogeneous data and complex data (to improve performance "efficiency and effectiveness" as standard)
Data mining involves the field: Data mining is a comprehensive and strong discipline and application
{
Applications (Enhancing modeling effects): Statistics, AI, machine learning and pattern recognition
Base (improve operational efficiency): database technology, parallel computing, distributed computing
}


Data mining process

The data mining process described below is a common process for cross-industry data mining and uses a good methodology, which is half the story. This methodology is the process of a data mining project, including phased goals, tasks, and implementation points. It is highly operational and is a recognized standard in the industry.

There are two key points to remember when using the project process:

1, Data mining project data preprocessing may take a lot of work time;

2, the Data mining project process is not a one-time execution is complete but continuous iterative optimization, and ultimately achieve optimal results.


Business Understanding:

"Phase Target"
Identify business issues and data mining goals
Develop a project plan.
"Work Tasks"
Business needs research, understanding business issues background
Project environmental assessment to identify required resources (human, cost, data, Parties)
Business objectives identified, clear business objectives and criteria for success
Mining target determination, clear data mining goals and success criteria
Project plan development, guide project implementation
"Key points of implementation"
Adequate demand research and communication,
Reasonable resources, constraints, assumptions,
Appropriate mining results application scenario settings

Data understanding:

"Phase Target"
Determining the data needed for modeling
Explore the target variables required for modeling
"Work Tasks"
Data dictionary compilation, combing internal and external data types
Determine the number of parameters, clear the meaning of data business indicators (the number of parameters of each feature, the number of periods, range)
Mapping rules determine the business rules that define data usage
Quality checks to ensure data availability
Target variable exploration, preparing for model building
"Key points of implementation"
The necessary internal and external data can be obtained
Data consistency, completeness, accuracy
Preliminary analysis of target factor determines

Data preparation:

"Phase Target"
Create a data mart or wide table
Load Data efficiently
"Work Tasks"
Data mart or wide table design
ETL Script Writing
Data cleansing, loading, conversion
Data quality Check
Data normalization
"Key points of implementation"
Coded code guidance Code for science
Accurate data mapping rules
Efficient ETL Guarantee project progress and quality

Data modeling:

"Phase Target"
Choosing the right technology to model
Implementing Data Mining goals
"Work Tasks"
Selecting the appropriate model algorithm for the selection of technology
Sample selection, identification of training samples, test samples and validation samples
Model building, screening variables, model training, model testing
Model evaluation to assess whether the model meets data mining goals
"Key points of implementation"
The right technology to help achieve your mining goals
Sample data truly reflects business needs
Variable factors explain business phenomena effectively
Comprehensive evaluation of model data mining effects

Model Evaluation:

"Phase Target"
Business application testing for the model
Determine whether to achieve business goals
"Work Tasks"
Model trial, identify business scenarios, conduct model application testing, collect feedback effects
Effect evaluation, evaluate and analyze the test effect, judge whether the model meets the commercial target
Marketing advice, extract marketing rules based on trial effect and give marketing advice
"Key points of implementation"
The right business scenario trial scenario
The effect evaluation of comprehensive science
Targeted Marketing Advice

Model deployment:

"Phase Target"
Deploy data mining results to the business environment for production
"Work Tasks"
Plan for deployment, develop deployment plans and scenarios
Monitoring and maintenance, real-time tracking, verification of business objectives achieved
Summary report, Experience accumulation
"Key points of implementation"
Scientific planning to ensure seamless deployment
Immediate monitoring and maintenance response to ensure operational
Comprehensive summary and analysis, accumulate experience

The skills involved in the process of

Data mining include knowledge of business understanding, data development, and statistical AI energy. The need for individuals to have a strong comprehensive ability, such as communication skills, such as business analysis capabilities, such as SQL technology, such as mining modeling capabilities and so on. The beauty of data mining is that he needs to broaden his knowledge and find the best ways to communicate with people in the project, to understand the business, to need technology, and to manage the whole project, which is more like the role of a project manager. In the future, you can take the direction of project management and product manager.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.