See Original book section 1.5
General process for building predictive models
The problem of the daily language expression--the problem of the mathematical language restatement
Restatement of problems, extraction features, training algorithms, evaluation algorithms
Familiar with the input data structure of the different algorithms:
1. Features required to extract or combine predictions
2. Set the training target
3. Training model
4. Evaluate the performance of the model on training data
Machine learning:
Develop the entire process of a model that can actually be deployed, including the understanding of machine learning algorithms and the actual operation
Often, there are very practical reasons why some algorithms are often used to understand the reasons behind
(1) Constructing a machine learning problem
Examine the data in a dataset to determine what form of prediction needs to be made
For example, what does this data represent? How do I associate with a predictive task?
1. "Better results", measurable targets that can be optimized
2. Collection of data, expressed as a matrix of features
3. Objective: Known good data results are used for training
< <----------------problem Reconstruction
| |
Mathematical description of the problem--model training and performance evaluation--model deployment
(2) Feature extraction and feature engineering
Feature extraction: (determines which features can be used to predict the target)
The process of converting a free form of data, such as a word in a document, into a number in the form of rows and columns
Feature Engineering:
Organize and combine features to achieve a richer information process
Algorithms that provide a measure of the contribution of each feature to the final prediction result
Scoring features, identifying importance
Note: Data preparation and feature engineering estimates will account for the time it takes to develop a machine learning model 80%~90%
Usually train 100~5000 a different model, then choose the model that best matches the problem, the data set
(3) Determine the performance of the post-training model
Test sets: Set aside part of the data to test the performance of the model
Python machine learning-predictive analytics core algorithm: A general process for building predictive models