Public Course address:Https://class.coursera.org/ml-003/class/index
INSTRUCTOR:Andrew Ng
1. deciding what to try next ( Determine what to do next )
I have already introduced some machine learning methods. It is obviously not enough to know the specific process of these methods. The key is to learn how to use them. The so-called best way to master knowledge is to put it into practice. Consider the earliest house price prediction question. If you have obtained a prediction model using normalized linear regression, you will happily use new data to test the correctness of your model, the result is disappointing. You find that the error between the predicted value and the actual value is too large:
So you need to improve your model. There are several methods listed above for improving the model, that is, getting more training samples? Is feature reduction? YesLambdaOr decreaseLambda? Although you know there is certainly a way to help you improve the model, so many options make you dizzy. What should I do? One by one?
Obviously it takes too long for you to try it one by one. You need a method to quickly know whether an option is feasible. Therefore, you have introduced the machine learning diagnostic technique:
As mentioned above, diagnosis tells you how to learnAlgorithmAnd provides guidance on improving the effectiveness of algorithms. Although the diagnosis takes some time, it is insignificant compared to trying the backup option one by one.
2. Evaluation a hypothesis ( Evaluation hypothesis )
In response to the above-mentioned housing price prediction problem, we know that the cause of the prediction error is that our model is not applicable to data not in the training set. To better utilize the data that is not easy to collect, you can split the training set and use part of it as training (for example70%), The rest is used as a test:
After division, we are looking for a new cost function in the training set.JTo the minimum value of the cost function in the test set.JMinimum. For Linear Regression:
For logistic regression:
TwoJValue, the most important of which is to ensureJtestAs small as possible.
3. Model Selection and Training/validation/Test Sets ( Model Selection and Training / Verify / Test Set )
We still need to consider the housing price issue. Now we only know that linear regression is used, and we don't even have a specific model. We can assume that the model can be from1To10Any of the following levels:
How can we choose from so many models? According to the previous ideas, we should consider the error of the test set. However, there is a problem here. The test set is generally a subset of the actual situation. Correct test set does not mean that there is no error in the actual situation, so it is an incorrect optimistic estimate. At the same time, you need to know that the model has been fixed during the test. It is hard to imagine that the higher the order, the more accurate the test will be, and there may be overfitting situations. In order to prevent overfitting, we need to avoid this situation during training, so another set is introduced.-Verification set, the original training set is also based on<6 2 2>Division. here we need to emphasize that verification is in the learning process to prevent overfitting, while testing is after the learning process to test the training effect, this is the difference between the verification set and the test set:
After division, we need to calculate three price functions:
During training, we chooseJCVThe smallest model, and then use the test set to test the training results:
4. diagnosing bias vs variance ( Deviation variance Diagnosis )
Use a picture to distinguish between deviation and variance:
As shown in, in the case of underfitting, the deviation is large, and in the case of overfitting, the variance is large. It can be said that in the process from underfitting to overfitting, the deviation decreases while the variance increases, and the cost function value of the training set is decreasing, however, the cost of the function value for the verification set (the same for the test set) is reduced first and then increased. The reason for the increase is overfitting, as shown in:
Therefore, we can sum up the rule of deviation and variance:
Deviation:J (Train)Large,J (CV)Large,J (Train) ≈ J (CV),BiasOriginated fromStage of order small underfitting
Variance:J (Train)Small,J (CV)Large,J (Train) <j (CV),VarianceOriginated fromStage of Higher Order over fitting
Is a more visualized representation of this rule:
If you have not learned the probability and do not know the deviation and variance, see Wikipedia's explanation:
Bias:Http://en.wikipedia.org/wiki/Bias_of_an_estimator
Variance:Http://en.wikipedia.org/wiki/Variance
5. Regularization and bias/variance ( Normalization and Deviation / Variance )
The normalization concept has been mentioned above, but it is nothing more than adding an item behind the cost function to prevent overfitting. For linear regression, we can setLambdaTo adjust the fitting condition:
Now we have another option.LambdaYou can selectLambdaAnd then findTheta, And then calculate the value of the price function of the verification set to find a differentLambdaAnd then selectLambda:
JoinLambdaThen, we will find another rule:
Lambda is too small, leading to overfitting and variance,J (Train) <j (CV)
Lambda is too large, leading to underfitting and deviation,J (Train) ≈ J (CV)
Note: HereJ (trian)AndJ (CV)The formula does not contain normalization items. I guess it is only compared here.J (trian)AndJ (CV)The normalization items of the two are the same. delete them.
6. learning curves ( Learning Curve )
The situations mentioned above are all in the same number of samples. Here we should consider the impact of the number of samples on the model. As shown in:
It can be found that as the number of samples increases, the cost function of the training set should increase. There is only one point on the right of the training set, and a second-order curve can be found to pass through this point, however, when there are more sample points, the second-order Curve cannot pass through all the sample points, and the error naturally increases. At the same time, the cost function of the verification set is decreasing because the training results become more accurate as the number of samples increases, and the verification effect naturally changes.
When there is a large deviation, we can see whether the increase in the number of training samples can reduce the variance:
It seems that this is not feasible. The model format has not changed,J (CV)AndJ (Train)The error rate does not change, so it is useless to add training samples. What about the large variance?
We can see that adding sample values can make the model better fit the data and reduce the number of samples.J (CV)(J (test)Same) andJ (Train). Therefore, we can conclude that:Increasing the number of training data is useful for overfitting and futile for underfitting.
7. deciding what to try next-Revisited ( Determine what to do next - Review )
Return to the several solutions proposed at the beginning of this section that have a large error in the predicted values, and learn the diagnosis to quickly know the applicability of each solution:
At last, we need to add overfitting and underfitting situations for Neural Networks:
---------------------------------------------------------Weak split line-------------------------------------------------------
This lecture describes how to analyze problems encountered in practice. If we have learned the diagnostic methods in machine learning, we do not need to try one by one.AndrewThe professor mentioned that some employees in Silicon Valley do not use this diagnosis method. In a similar situation, they just choose one option to try it (such as increasing the number of samples ), this attempt will take several months, and the result will be no result, wasting manpower and material resources. Obviously, if it takes a few hours to analyze it carefully, you will be able to know what is the right direction. Like a lot of knowledge, it is a long process for machine learning to be used from knowing to mastering. I am afraid no one can get rid of this process.