Original: http://www.zhihu.com/question/27068705
What are the differences and linkages between bias (deviations), error (Error), and variance (variance) in machine learning? Modification recently in Learning machine learning, learning to cross-validation, there is a piece of content that is particularly confusing to me, error can be understood to run out on the test dataInaccurate rate, i.e. (1-accuracy rate)。
On the training data, we can perform cross-validation (cross-validation).
One method is called K-fold Cross Validation (K-fold cross-validation), K-fold crossover verification, initial sampling is divided into K-sub-samples, a separate sub-sample is retained as the data of the validation model, and other K-1 samples are used for training. Cross-validation repeats k times, each sub-sample is validated once, the results of the average K or other combinations are used, resulting in a single estimate.
When the value of K is large, we will haveless bias (deviations), more variance.
When the K value is small, we will havemore bias (deviations), less variance.
I do not understand the above description, ask the great God to explain what is bias, Error, and variance?
Cross-validation, what is the impact of these three things? Edit Report Add comment share• Invitation to answerSort by poll by Time12 answers Support objection, will not show your nameOrangeprince,http/orangeprince.info Emma Ma,zhudelong, Chen Watermelon and others agree First Error = Bias + Variance
Error reflects the accuracy of the entire model, bias reflects the model in the sample between the output and the real value of the error, that is, the accuracy of the model itself, variance reflects the model of each output and model output expectations of the error, that is, the stability of the model.
For example, a target experiment, the goal is to hit the 10 ring, but actually only hit the 7 ring, then the error is 3. The specific analysis of the reasons for hitting the 7 ring, there may be two aspects: one is aimed at the problem, such as the actual shooting is aimed at 9 ring instead of 10 ring, the second is the stability of the gun itself is a problem, although the target is 9 ring, but only hit the 7 ring. So in the previous shooting experiment, bias is 1, the response is the difference between the model expectations and the real goal, and in this experiment, because the variance caused by the error is 2, that is, although the target is 9 ring, but because of the lack of stability of its own model, resulting in the actual results and the gap between the model expectations.
In a real system, bias and variance are often not combined. If you want to reduce the bias of the model, it will increase the variance of the model to a certain extent, and vice versa. The root cause of this phenomenon is that we always want to try to estimate infinite real data with limited training samples. When we are more convinced of the authenticity of these data, and neglect the prior knowledge of the model, we will try to ensure the accuracy of the model in the training sample, thus reducing the bias of the model. However, the model that is learned in this way is likely to lose some generalization ability, thus causing overfitting, reducing the model's performance on real data and increasing the uncertainty of the model. On the contrary, if we trust our prior knowledge of the model more, we can reduce the variance of the model and increase the stability of the model, but also increase the bias of the model in the process of learning the model. The trade-off between bias and variance is one of the basic themes of machine learning, and the opportunity to find its shadow in various machine models.
Specific to the K-fold cross validation scene, in fact, is very good understanding. First look at the change of variance, or the example of shooting. Suppose I aim at the 10 ring, although each shot has a bias, but the direction of the deviation is random, that is, it is possible to upward, it is possible to downward. So the more the number of trials, should be up and down the number of times closer, then we will all shooting targets to take an average, should also be closer to the center. More microscopic analysis, the predicted value of the model and the expected to produce a large deviation, in the case of the model is fixed, the reason is still on the data, such as the emergence of some anomalies. In the most extreme cases, we assume that only one point is abnormal, and if only one model is trained, then this point will have an effect on the whole model, which makes the model with great variance. However, if you train with K-fold cross validation, only 1 models will be affected by this abnormal data, while the rest of the k-1 models are normal. After the average, the effect of this anomaly data is greatly reduced. In contrast, the model bias can be directly modeled, just to ensure that the model training in the training sample of the minimum error can ensure that the bias is relatively small, and to achieve this goal, it must be trained with all the data in order to reach the optimal solution of the model. Therefore, the objective function of K-fold Cross validation destroys the previous situation, so the bias of the model will inevitably increase.edited on 2014-12-15 5 reviews ThanksShareCollection • No help · Report • Author retention rights 8 Approval objection, will not show your nameLi Wenzhe,[email protected], chief [email protected],...Solonius Quintus, Yang Mei,Shuai Wang and others agree More accurately, the error is divided into 3 parts:
Error = Bias + Variance + Noiseposted on 2014-12-17 add Comment thanksShareCollection • No help · Report • Author retention rights 2 Approval objection, will not show your nameVeronica C,hrsAkane, talk about approval The mean square error of an estimator in statistics is so defined
edited on 2014-12-17 add comment thanksShareCollection • No help · Report • Author retention rights 3 Approval objection, will not show your nameChen Little,CoderLi Yu, Anonymous,Nova Avon agree
The blog of self-change: Aaron's Column
edited on 2014-12-16 add comment thanksShareCollection • No help · Report • Author retention rights 1 Approval objection, will not show your nameAnonymous UserFong agreesWhat is the difference between deviation and variance? -Jason Gu's answer
Error is like the first floor said = Bias + Varianceposted on 2014-12-15 add Comment thanksShareCollection • No help · Report • Author retention rights 5 Approval objection, will not show your nameSmall sense,Data AnalystXiayuan, Zhang Long,Dill Zheng and other people agree At the beginning of the statistics, it would be r-square (adjusted, AIC, BIC, etc) to interpret the model as much as possible, but that would have overfitting problems. That means the model explains training data giant Bull, but testing data won't. Overfitting is the literal meaning, the model is unbiased, but the variance is too large. So we need to divide the data into subset. Variance-bias balance and the like. How can you write a book about the balance?
So why Cross-validation again, a big benefit is to avoid the test dataset two times overfitting. Each time the one that is LOOCV, k-fold generally take k=5/10 more common, of course, can also be based on your needs (see how the sample amount can be divided into the same), of course, the specific choice of a few can be explored a lot, but also to see the computer and software computing power ...published on 2015-01-31 1 reviews thank youShareCollection • No help · Report • Author retention rights 3 Approval objection, will not show your nameKuzzy Steve,keep the Curiosity of the worlddanning Wang, lu Yuan, small sense of approval Error is like the first floor said = Bias + Variance
Generally speaking, machine learning will choose a function space, this function space may not contain the optimal function, so even if the function space to learn that the loss of the function of the smallest one will be the real best function, the difference is bias. In addition, since we do not know the joint distribution of the training data p (x, y), of course, know that you do not have to learn, you can directly deduce the P (y|x). So we tend to learn from the existence but unknown p (x, y) of the training data set randomly, because the training data set can not be unlimited, so that the training data distribution and the real p (x, y) is also inconsistent, so on the basis of the original bias and added variance.
These are the two sources of error.
For example, we want to do the data fitting, here we assume that the real data is exponential distribution, but we do not know, so we choose to use linear function to fit the data, so that the model in the function space only the linear model, and the real exponential distribution there is a gap, the choice of linear space is brought into the bias. In addition, we can only learn the parameters of the linear model through the limited observation points, the finite observation points from the overall distribution of the sampling, not fully conform to the overall data of the P (x, y) distribution, this is the source of variance.
For a more vivid explanation, we can understand that: suppose we want to measure the length of an object, we will choose a ruler to measure, this ruler may not be completely accurate, choose a ruler different will bring bias. Then we use a ruler to measure when people go to estimate will bring error, this is called variance.
I wonder if you can understand that.edited on 2015-01-31 1 reviews ThanksShareCollection • No help · Report • Author retention rights Jinlianga flying wolf.Roach Sinai agreedRecommend a blog post: Mathematics in Machine learning (2)-linear regression, deviation, variance tradeoff
Variance and deviation in general, from the same data set, using a scientific sampling method to get a few different sub-datasets, using these sub-datasets to get the model, you can talk about their variance and deviation of the situation. Variance and deviation are generally proportional to the complexity of the model, just like the four small images at the beginning of this article, when we blindly chase the model exact match, it may lead to the same set of data training different models, the difference between them is very large. This is called variance, but their bias is very small, as shown in the following:
The blue and green dots are the different sub-datasets that are sampled in a dataset, and we have two n curves to fit the set of points, then we can get two curves (blue and dark green), which are very different, but they are generated by the same data set, which is the large variance caused by the complex model. The more complex the model, the smaller the deviation, and the simpler the model, the greater the deviation, and the variance and deviation are changed as follows:
Finally, we describe the deviations and variances in a mathematical language:
E (L) is the loss function, H (x) represents the average of the real value, the first part is related to Y (the model's estimation function), this part is due to the difference in the selection of the estimated function (model), and the second part is not related to Y, this part can be considered as the inherent noise of the model.
For the first part of the formula above, we can form the following:
This part is deduced in the 1.5.5 of PRML, the first half is the deviation, and the other half represents the variance, we can conclude: loss function = deviation ^2+ variance + inherent noise.
What are the differences and linkages between bias (deviations), error (Error), and variance (variance) in machine learning?