Link (Chapter 1~12):
Https://gallery.mailchimp.com/dc3a7ef4d750c0abfc19202a3/files/Machine_Learning_Yearning_V0.5_01.pdf
Links (Chapter 13th):
Https://gallery.mailchimp.com/dc3a7ef4d750c0abfc19202a3/files/Machine_Learning_Yearning_V0.5_02.pdf
Links (chapter 14th):
Https://gallery.mailchimp.com/dc3a7ef4d750c0abfc19202a3/files/Machine_Learning_Yearning_V0.5_03.pdf
This article has reprinted : http://blog.csdn.net/u014380165/article/details/73611858
Mly--1. Why Machine learning Strategy
This chapter is the main point, pointing out that in your deep learning model encountered problems, you need to choose the right way to solve, then what are the general methods? The authors list several of the following:
1, to obtain more data.
2, increase the diversity of training samples.
3. Increase the number of iterations.
4, try a deeper, wider network, such as increasing the number of layers, the number of convolution cores, parameters and so on.
5, try a smaller network.
6, Home Plus, such as L2 regular items.
7, change your network structure, such as activation mode, the number of convolutional cores.
personal sentiment : increase the diversity of training samples can be understood: if you want to do cat and dog classification, then different types, different postures, sleeping meals and so on the cat, a variety of things to improve your algorithm effect is helpful. if you want to increase the number of iterations can be judged according to your loss situation, when your loss is not stable, and the model is not too well-timed, you can increase the number of iterations. Experience requires that you run all your data dozens of times to achieve a good result. deeper networks can be achieved by increasing the number of layers, such as ResNet-152, of course, the original simple increase in the number of layers will encounter severe gradient attenuation is difficult to train, so there is a resnet network to solve the problem. The wider network is mainly by increasing the number of convolution cores in some layers, a convolution nucleus is equivalent to extracting a feature, the more the convolution core, the richer the characteristics, of course, also increased the computational capacity. Regular items are generally used at the time of overfitting, simply speaking is the limit of the weight of the size, L2 is the sum of the weights of the square and added to the target function, and finally the weight tends to smooth, there is a regular term is L1, can make weights tend to sparse, that is, many of the 0. Change the network structure is generally a relatively fine adjustment, such as activation from Relu to Prelu, a layer of convolution core size changes, and so on, general small change effect will not be too obvious.
Mly--2. How-to-use the-book-to-Help your team
mly--3. Prerequisites and Notation
This chapter is also not important, that is, if you are not familiar with machine learning, you can look at this link : http://ml-class.org
mly--4. Scale Drives machine learning progress
In fact, deep learning (or neural networks) has been around for decades, but why is it so hot now? The author mentions two main factors:
1. Sufficient data
2. Sufficient computing power
Although there are many data, but the traditional machine learning algorithms, such as logistic regression, with the increase in data volume, its effect will encounter obvious bottlenecks, if the use of different depths of the neural network to train, the effect is basically increased with the increase in network depth. Such as:
Why is that? Because the features can be constructed artificially on small data, the features that are generally constructed on small datasets are still very effective, but with the increase in data sets, it becomes more difficult to construct features artificially.
mly--5. Your Development and test sets
In general, when conducting deep learning projects, data is divided into three parts: the training set, the validation set, and the test set. Your model is trained on the training set, then validates its effect with a validation set, and optimizes the model based on the results of the validation set. The author emphasizes that your test set must be distributed in the same way as the actual data.
personal sentiment: Some people may feel that directly into the training set and test set on the line, why add a validation set. In fact, it should be understood that the validation set is also a test set to test the generalization capabilities of your model. The test set generally refers to the environment on the line. In general, we get the data, which can be divided into training sets and validation sets, to be randomly divided so that the training set and the validation set can have similar distributions, and then we train and optimize our models on both datasets. Finally, the model is applied to the actual scene, this is actually our test set, because the actual scene of the data is generally more than the data we collected before, so the model performance is not as good as the offline validation set, because you start to get the data can not be perfect, This data needs to be added to our training set and validation set, then trained and optimized until we iterate to the next version. So constantly repeating. In short remember: training set, validation set, test set to have the same distribution as possible.
mly--6. Your Dev and test sets should come from the same distribution
Similar to the fifth chapter, emphasizing that the validation set and the test set should have the same distribution, or you might have trained a very good model on the validation set, but performed poorly on the test set.
mly--7. How large does the Dev/test sets need to be?
This chapter is mainly about the validation set and test set how much more appropriate? If you have only 100 samples of your validation set, the minimum unit for your accuracy increase is 1%, not 0.1%, unless your validation set increases to 1000 samples. Test set samples also need to be large enough to express your model effect, but generally it is not recommended too much.
personal sentiment : I think there are no hard rules for the partitioning of validation sets and test sets, and I generally divide the data set into training: validating =9:1.
mly--8. Establish a Single-number evalution metric for your team to optimize
It is mainly about the evaluation standard of the model. It is better to use an evaluation criterion (such as accuracy) to judge the model's merits rather than using multiple criteria (such as precision and recall). The main reason is that multiple criteria are not good for comparing algorithms. For example, this diagram:
In fact, if you want to balance precision and recall, you can use F1 score.
mly--9. Optimizing and Satisficing Metrics
In the tradeoff between the algorithm effect and speed, in addition to similar accuracy-0.5*runningtime can be compared, but also in other ways, such as: if the acceptable run time is 100ms, then under this limit to find the most accurate rate algorithm. Of course, if you have multiple constraints, such as model size, run time, algorithm effects, and so on, then you can find the best under the premise of satisfying the conditions. Another example is false positive rate and false negative rate, such as if you want to lower your false negative rate at false positive rate of 0.
mly--10. Have a dev set and metric speeds up iterations
When you get a problem, you have to think about how the problem will be solved, and then use the code to express it, and finally, according to the results of the experiment to think about whether the solution before the improvement of the place, and constantly iterative. The figure given by the author is very graphic
mly--11. When to change dev/test sets and metrics
The author mentions that when starting a project, it's best to define validation sets and test sets and evaluation criteria within a week, rather than spending too much time making and thinking. Later, when you find a flaw in the validation set and test set that you defined earlier, you should modify it as soon as possible. Here are three points to note, and if you have these three, then you need to modify your validation set.
1, the actual data distribution and your validation set of data distribution is different.
2, on the validation set over the fitting.
3, your evaluation criteria and model optimization direction is inconsistent. For example, one of your image classification algorithms, the accuracy of model A is higher than the accuracy of model B, but sometimes model A is easy to false negative some special images, which is not tolerated, and model B does not, so it is better to tell model B accurately than model A. How to improve it? It can be punished by some special images of false negative, and not only by the accuracy rate to characterize the model.
mly--12. Takeaways:setting up development and test sets
1. Your validation set and test set should be captured as much as possible from the data in your actual application scenario. Validation sets and test sets do not have to be distributed identically to your training data. (I think it's best to have a similar distribution between the training set and the validation set, if the training data and the validation data are distributed too much, you may be able to train many times to get good results)
Mly--14. Evaluating multiple ideas in parallel during error analysis
The image of the error category is listed in a table and comments possible causes, and then analyze the cause, improve the algorithm and the preprocessing or sticky note problem. By analyzing and selecting the priority training of each optimization category, conquer can improve the model effect.
The table format can be referenced as follows:
Machine learning Yearning-andrew NG