The 1th Chapter: An Introduction to statistical learning methods

Last Update:2015-03-10 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

http://www.cnblogs.com/levone/p/3531054.html#2898984

1.4 Model Evaluation and model selection

generalization ability (generalization ability): the ability to predict unknown data with learning methods.

over Fitting (over-fitting): the model selected in the study contains too many parameters, so that the model is expected to known quantity well,

But the very bad image of the unknown is predicted.

Experience risk minimization (empirical risk minimization, ERM): The minimization of the solution loss function:

When the model is a conditional probability distribution and the loss function is a logarithmic loss function, ERM is equivalent to the maximum likelihood estimate (maximum likelihood estimation).

structural risk minimization (structural risk minimization, SRM): When sample capacity is very small, it is easy to have a fit (overfitting) problem, SRM is to prevent overfitting. SRM equivalent Hermetical (regularization). SRM is the addition of a regularization term (regularizer) or penalty (penalty term) that represents the complexity of the model on the basis of ERM:

The need to meet the empirical risk and the complexity of the model is small. When the model is a conditional probability distribution, the loss function is logarithmic loss function, and the model complexity is represented by the prior probability of the model, SRM is the maximum posteriori probability estimate (maximum posterior probability estimation, MAP) in Bayesian estimation.

To minimize the error of the test, a model with appropriate complexity needs to be selected. There are two commonly used model selection methods: regularization and cross-validation .

1.5 regularization and cross-validation

Structural risk = empirical risk + regularization

As shown above, the first item is empirical risk and the second item is regularization

　　regularization : Penalty in structural risk, can choose L1 norm of parameter vector, L2 norm of parameter vector and so on.

The role of regularization is to choose a model with both empirical and structural risk.

Regularization conforms to the law of the Ames Razor (Occam ' s Razor, Ockham ' Srazor): A better model is a simpler model that interprets known data well.

Datasets are often cut into three parts: the training set (training set), the validation set (validation set), and the test set, respectively, the user training model, the model selection, and the evaluation of the model. But in the premise of insufficient data, it is obviously unscientific to slice the data again.

Therefore, cross-validation (cross validation) is introduced into the following methods:

Simple cross-validation: simply cut the dataset into two parts, training set and test set

S-fold cross-validation: Cut the DataSet into a subset of the same size as S, select the S-1 subset training model, the remaining subset to test the model, repeat s times and then select.

Leave a cross-validation: Used in case of lack of data. is a special case of S-fold cross-validation s=n.

1.6 generalization capability

generalization capability (generalization ability): the model learned by this method predicts the ability to predict unknown data.

Generalization error (generalization error): is the desired risk of the model you are learning.

Generalized error upper bounds (generalization error bound): It is a function of the sample capacity, when the sample capacity increases, the upper bound of the generalization tends to 0; it is a function of assuming space capacity (capacity), assuming the larger the space capacity, the more difficult the model will be. The greater the upper bounds of the generalization error.

The first is the empirical error (training error)
The second, n is the number of samples, when n tends to infinity, this is 0, that is, the expected error equals the empirical error
D represents the number of functions in the assumed space, the larger the more difficult to learn, the greater the generalization error

1.7 generation model and discriminant model

The generation method (generative approach) learns that the model is called the Generative model (generative models), the data learns the joint probability distribution P (x, y), and then the conditional probability distribution P (y| X) as a predictive model, p (y| x) =p (x, y)/P (×), the typical generation model is naive Bayesian model and hidden Markov model.

Advantages:

can get a joint probability distribution
Faster convergence rate
When there are hidden variables, you can still use the

　　The discriminant method (discriminative approach) learns the model called discriminant model (discriminative models), which is directly learned by the data of the decision function f (X) or the conditional probability distribution P (y| X), typical discriminant models include: K nearest neighbor algorithm, Perceptron, decision tree, logistic regression model, maximum entropy model, support vector machine, lifting method and conditional random field.

Advantages
Higher learning accuracy Rate
Easy to abstract data to simplify learning problems

1.8 Classification Problems

When the output variable is a finite discrete value, it is a classification problem
A classification model or categorical decision function that is learned is called a classifier (classifier)

1.9 Labeling (tagging) issues

generalization of classification problems , input is an observation sequence, and output is a sequence of markers
Typical applications, part-of-speech tagging, input word sequences, output is a sequence of markers (words, parts of speech)

1.10 Regression Problems

regression (regression): input and output are continuous variables, used to predict the relationship between input variables and output variables, that is, the selection of input variables to the output variable mapping function, equivalent to function fitting, select function curve to fit the known data and good prediction of unknown data.

According to the number of input variables, it is divided into unary regression and two-yuan regression; According to model type, it is divided into linear regression and nonlinear regression.

The first chapter mainly introduces some basic concepts, and it is necessary to understand these concepts.

The 1th Chapter: An Introduction to statistical learning methods

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The 1th Chapter: An Introduction to statistical learning methods

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The 1th Chapter: An Introduction to statistical learning methods

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support