[ML] Learning Theory

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Up to now, we have learned how the common machine learning algorithms work and how the learning steps are implemented. However, the application background of machine learning is diverse. To do practical engineering, you must learn how to evaluate the quality of a learning model based on specific problems, and how to rationally select and extract features, how to optimize parameters. These are also links that I used to do Pattern Recognition in the past, so when the recognition rate is very low, I am often confused and do not know how to improve: whether the model should be improved to change features, whether the number of training samples should be increased, whether the iterative algorithm should be optimized or the target function should be changed. Through learning theory, we can draw some guiding conclusions.

First, it is the bias-variance trade off problem. Assume that there are k Alternative Models in the training model set h, k Represents the complexity of the model, and the training set has m samples, the formula test error <= training error + 2 * (log (2 k/delta) * 1/2 m) ^ 0.5 is true in probability 1-Delta. Training error is a so-called bias, which indicates the degree of fit between the training sample and the model. The larger the bias, the higher the training error, the lower the degree of fit between the training sample and the model, that is, there is a situation of "under learning"; 2 * (log (2 k/delta) * 1/2 m) ^ 0.5 is variance, and the larger the K (that is, the greater the complexity of the Model) the smaller the m value (that is, the smaller the number of training samples), the larger the variance value, the worse the model promotion capability, that is, the "over-learning" situation.

This conclusion has another inference: given delta and gamma, if test error <= training error + 2 * gamma is true under probability 1-delta, the number of training samples m must satisfy: m> = O (1/gamma * log (K/delta )). This inference shows that, to ensure that the test error is not too large, the number of training samples m must be proportional to the complexity log (K) of the model. The actual model complexity is generally not represented by K. Instead, assuming that the model has D parameters, the dimension of each sample point is d, and each parameter is of the double type, then K = 2 ^ (64d ), the above condition becomes m> = O (D/gamma * log (1/delta), that is, the number of M in the training sample is proportional to the number of model parameters d. The above conclusion is for the finite dimension space. For the infinite dimension space, D is replaced by the VC Dimension of H. A similar conclusion can be obtained. Generally, the VC dimension is proportional to the number of model parameters D, but in some special cases, the VC dimension is not necessarily related to the sample dimension, such as SVM. The bias-variance trade off process is actually the process of Model Selection and feature selection. For model selection, the most practical method is to perform cross verification to obtain the model with the smallest test error. For feature selection, you can use the forward selection or Backward Selection Method to select good features, delete bad features, or use the filtering method to calculate the amount of mutual information between each feature XI and Y, take the feature with a large amount of mutual information.

Bias-variance trade off aims to find a balance between training error and promotion capability. To achieve this balance, you can also add regularation. Looking at machine learning from the Statistical Inference perspective: Without regularation corresponding to the frequency school method, the theta parameter is regarded as an unknown deterministic variable, the learning process is to find Theta corresponding to the maximum likelihood of Y and X, and add regularation to the Bayesian school. The Theta parameter is considered as a random variable, the learning process is to know the prior probability of theta and calculate the maximum posterior probability of Theta. After regularation is added, Lamda * | Theta | ^ 2 is added to the target function. For a regression problem, after adding a regular item, the fitting results will be smoother and "over-fitting" will be effectively reduced.

After learning so many learning theory, let's go back to the question raised at the beginning of the Note: how to optimize the learning algorithm? First, identify whether it is a high bias problem or a high variance problem. There are two methods to judge: 1. If the test error is large, it is a high variance problem, and if the training error is large, it is a high bias problem; 2. Increase the number of training samples and check the variation trend of two types of errors. Test Error becomes smaller, which is a high variance problem. Increasing the number of training samples and reducing the number of features can solve the high variance problem, and increasing the number of features can solve the high bias problem.

Http://www.cnblogs.com/uchihaitachi/archive/2012/09/11/2680410.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[ML] Learning Theory

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[ML] Learning Theory

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support