1.1 Statistical learning
Concept
Statistical learning (statistical learning) is a subject of computer-based probabilistic statistical modeling of data construction and the use of models to predict and analyze data, and statistical learning is also a statistical machine learning (statistical machines learning).
Characteristics
- Statistical learning takes data as a research object and is a data-driven discipline
- The purpose of statistical learning is to predict and analyze data .
- Statistical learning takes method as the center , the statistical learning method constructs the model and uses the model to carry on the prediction and the analysis. Including supervised learning, unsupervised learning, semi-supervised learning, intensive learning, etc.
1.2 Supervised learning
Concept
Starting from a given, finite, training data set for learning, the assumption is that the data is generated independently of the same distribution; And suppose that the model to be learned belongs to a set of functions, called the hypothetical space; By applying an evaluation criterion, an optimal model is selected from the hypothesis space, which has the best prediction for the given evaluation criteria of the known training data and unknown test data ; The selection of the optimal model is implemented by the algorithm.
Formal graphics
1.3 Three elements of statistical learning
METHOD = model + strategy + algorithm
1.4 Model Evaluation and model selection
Training error
Training error is the average loss of the model about the training data set
Test error
Test error is the average loss of the test data set
Generalization capability
Generally, the ability to predict the unknown data by learning method becomes generalization ability.
Over fitting
If you blindly pursue the ability to improve the predictive power of training data, the model complexity is often higher than the real model.
The relationship between training error and test error and model complexity
1.5 regularization and cross-validation
Regularization
Regularization is the implementation of structural risk minimization strategy, which is to add a regularization or penalty to the empirical risk.
Cross-validation
Slicing the given data, combining the segmented datasets into training sets and test sets, based on repeated training, testing, and model selection.
1.6 Generalization capability
Concept
The generalization ability of learning method refers to the predictive ability of the model learned by this method to the unknown data.
1.7 Generation model and discriminant model
Supervised learning methods can be divided into
- The generation method generative approach. Learn the joint probability distribution and find out the conditional probability distribution as the model of prediction. Including naive Bayesian method and hidden Markov model
- Discriminant method discriminative approach. Direct learning conditional probability distribution. including K-nearest neighbor method, Perceptron, decision tree, logistic regression model, maximum entropy model, support vector machine, lifting method and conditional random field, etc.
Comparison
1 Generation method can restore the joint probability distribution, and the discriminant method cannot. The learning convergence speed of generative method is faster; When there are hidden variables, the generation method can still be used, and the discriminant method cannot continue to be used.
2 Discriminant Method Direct learning conditional probability distribution, the learning accuracy rate is higher; Data can be abstracted to various degrees, defining features and using features to simplify learning problems
1.8 Classification Problems
Concept
Supervised learning learns from data a classification model or categorical decision function called classification (classifier), which the classifier outputs to the new input (prediction), called Classification (classification).
Classification Evaluation Index
Confusion matrices (confusion matrix)
TP predicts the positive class as the number of positive classes
FN predicts a positive class as a negative class number
FP predicts negative class as number of positive classes
TN predicts negative class as negative class number
Different measure values
Precision = TP/(TP + FP) accuracy rate
Number of positive sample predictions/total predicted as positive
Recall = TP/(TP + FN) recall rate
Positive sample forecast result/Positive sample actual number
FPR = FP/(fp + TN)
Negative sample result/Negative sample actual number predicted as positive
FNR = FN/(TP + fn)
Positive sample result/Positive sample actual number predicted as negative
1.9 Labeling Issues
The input to the callout question is an observation sequence, and the output is a sequence of tokens or sequences of States.
1.10 Regression Problems
Regression is used to predict the relationship between the input variable and the output variable, especially when the value of the input variable changes, and the output variable changes as it occurs.
1 Fundamentals of statistical learning methods