If you are not a math department, don't look at this.
Because the following is used to demonstrate the correctness of machine learning methods, you can use machine learning to get the results you want. For those who program or use this method, however, you can just use it with confidence and boldness. Just like you know 1+1=2, you don't need to know why it equals, anyway you can use.
The following images are used in the courseware of Yang Yi teacher of Shanghai Jiaotong University, the website is as follows: http://bcmi.sjtu.edu.cn/~yangyang/ml/
Write in front, this lesson I only understand part, later know this actually does not need to understand, did not study carefully, may go to behind suddenly there is no content, obsessive-compulsive patients read carefully.
First, let's look at a few concepts:
M: Training Data
H: Assuming that space, such as our mapping function, is linear, then this space includes all the linear functions that satisfy the hypothetical conditions (which may be the case)
E (use it instead): accuracy, when programming the accuracy of the output is it
The fourth one (this symbol really ...) ): We use different training datasets to produce a different rate of accuracy, which represents the probability that we can learn the right results from the selected training set.
PAC Framework: Assume that all training data is classified accurately and without noise. But it's basically impossible to achieve in reality.
Agnostic Framework: The training data is noisy. Meet the actual situation
T
The C in the diagram is a completely categorical and accurate space, and H is our hypothetical space, and their non-overlapping parts are illustrative of our predictions.
D represents all the data in the ideal state, and the bottom symbol is the probability of the classification error (that is, the part where C and H do not intersect, I thought only the crescent on the right)
, the formula below, as in the previous page, explains the formula above
S is a data set for training (that is, part of the ideal complete DataSet D), so we calculate the error rate by finding all the results of the classification errors of the mapped functions using the training, dividing the number of errors by the number of total training is the error rate ~
The sum of the above (a Pousron) is a pulse function, that is, if C and H are not equal to take 1, equal to take 0
This is a theorem in which all events together have a probability of being less than or equal to the probability of their respective addition.
This is also a theorem formula, remember on the line, called Hoeffding inequality
R is a constant that you specify.
The probability that Zi equals 1 and equal to 0 is already known, that is (FAI) and (FAI), the value that we have trained to estimate is (FAI), but this formula (FAI) is required to require all estimates of the average, and the probability of the front is less than equal to the back of the equation. M represents the number of samples
Version space: The large hypothetical space that was previously said, the mapping relationships that match the training data exactly
We already know so many concepts, and what exactly do we need to prove? Is the two formulas in the
As long as we prove that the above two formulas are set up, we can say that our learning methods are correct and feasible.
1, we trained the mapping relationship with the ideal complete data on the error rate is approximately equal to 0 (that is, we use some training data is OK)
2, let the error rate is approximately equal to 0 of the probability is equal to 1 (that is, no matter what kind of training data we take no effect)
Proves the two contents, then we can say that the method of learning is correct and feasible.
How does it prove about 0? In fact, just prove it has upper bound and lower bounds OK
As to the concrete proof step, hehe, did not understand.
After this article was published, the blog was upgraded to Level 4, and it was just 1000 points, remember, haha ~
If there are small partners found errors or I do not understand the place, I hope you can contact me to amend Oh, your kindness makes the world become lovely ~
Machine Learning---Computational learning theory