[Tai Machine learning note finishing] As a result of the generalization of the mapping relationship

Last Update:2016-07-13 Source: Internet

Author: User

Tags function examples new set

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the last lesson, it is possible to list the howding inequalities for a finite set of hypothesis assumptions as follows:

In the process of machine learning, we have to ensure that Ein and eout are relatively close, on the other hand in order to obtain a better result, also hope that Ein can be better. So there is a basic trade-off of the following:

When the assumption set size m is relatively small: it is easy to ensure that Ein and eout are relatively close, but this time because M is relatively small, not so easy to choose an ein relatively small results.
V.s.
When assuming that the set size M is larger: Because of the alternative hypothesis, it is easier to select an Ein relatively small results, but it is not easy to ensure that Ein and eout closer, need a larger sample set size n.

So here's the problem. In many models, there are actually countless hypothesis that we can choose. With the PLA as an example, we can choose a line with countless lines. So this time considering the M on the right, is it impossible to ensure that Ein is close to eout?

The answer is in the negative. Since the

In this derivation process, we used the union bound as an upper bound, assuming that the bad data of each hypothesis is independent of each other, which is much larger than the actual bounds. In practical application, the intersection of bad data of each hypothesis is very large. Still with the PLA as an example, two lines with a slope difference of 0.01, although two different hypothesis, but their eout (h) and Ein (h) are actually very close, so it is bad for a straight line The line of data is also a big possibility for another, and the new is bad data. At this point the probability of both being bad data is less than the union bound. And considering that there are many such hypothesis, the m in the primitive may be substituted with another expression when it tends to infinity.

Next, we start to deduce the alternative expression of possible m MH

To consider replacing infinite m with a limited amount, we consider grouping hypothesis by their results. The PLA is still an example. When the dataset size is 1 o'clock, hypothesis can only be divided into two categories, as follows:

When the dataset size is 2 o'clock, hypothesis can be divided into 4 classes

Whether and so on, the next can be divided into 2N classes? When the set of assumptions used is a straight line, the answer is negative:

‘

Thus, the amount of 2N that seems to be related to the size of the dataset can be used as a limited number for us to replace infinite m

Here we refer to a series of hypothetical sets that just include all of the hypothetical types as dichotomy, which is the smallest set of assumptions we can use for the hypothetical set of equivalent infinite assumptions

Although the upper bound of the size is 2N, the upper bound of the line segment may be smaller than 2N, so the growth function that defines a hypothetical set is '

The size of the set is related to the size of the dataset

And then there are some simple growth function examples

Considering that the dataset has at most only 2 N possible results, it is clear that for growth function there is always MH (n) <=2n

However, as shown in the following equation, in practice, when the upper bound of MH (N) is polynomial polynomial, it is clear that the right is convergent, so the probability of bad data has an upper bound of convergence. It is not enough to use only 2N as its upper bound

To continue to compress the upper bounds, we define a concept, break point, which means that for an H, if on n=k, all possible n-sized datasets have
MH (k) <2k
Called K is the break point of the hypothesis set H, apparently for k+1,k+2,.... They are also the break point of H, so the main thing is to find the smallest break point

For Perceptron, the previous knowledge indicates that the break point is k=4

L6

The following is an example of a minimum break point of 2 H, obviously when n=2, there should be
MH (2) <22, which has MH (2) <=3

When n is further increased, the maximum value of MH (3) is 4 (easily accessible by the enumeration method) because each subset of the data set meets the conditions of MH (2) <22 (i.e. up to three different cases per two combinations).
Thus, it seems that the concept of using the break point is likely to find a polynomial of the general expression of MH (N)

The concept of bounding function is introduced here, which is defined as the upper bound of MH (N) at N=k, which is what we are looking for now. The expression form here is B (n,k). To this, it is necessary to prove that the upper bound of B (n,k) is a polynomial.

, it is easy to get the value of the diagonal in N-k and B (n,k) table and the upper right triangle part by the definition of break point.

Next, we deduce the values of the lower triangular section with an example, with B (4,3) (n=4,k=3) as an example

Obviously there is B (3,3) = 7. In the case where a subset of its n=3 cannot have more than 7 different results, all of the resulting combinations are the lower right:

Grouping these results into two groups, where a new set of data points can have two possible results, and the other can have only one possible result (the case of n=3 excluded here is that the first x4 in the full X,beta class should be corrected to O).

Obviously here are the following relationships:

B (4,3) =11=2alpha+beta

And given that the alpha type requires x4 to produce two kinds of results, there cannot be more than one x in the alpha type. That is, the left sub-set of alpha, at n=2, has a maximum of three different possibilities. So, k=2 is its break point, we have

Alpah<=b (3,2)

Obviously, the alpha and beta types together are all subset types of n=3, and their size has

Alpha+beta<=b (3,3) =7

Finally, the expression of the above-mentioned descendants into B (4,3), we have

B (4,3) <=b (3,3) +b (3,2)

Easy to extend it to all n,k, and finally we get

thereby having

This is a polynomial of n, the highest of which is k-1 times.

In the above process, our premise hypothesis set can be classified, and the premise of classification is that the data set is limited. For the eout that we want to generalize, if it is computed, the dataset is actually infinite, so the following steps are needed to derive it. Both here and the course show only the approximate steps of deduction:

The distance from the E ' in and Ein of the same set of the same size is used first to replace the original Ein and eout distances, so that the bound function can be used

Then similar to the previous derivation, the infinite size of H is replaced with the smallest alternative h, thus using the growth function

Finally, two values are replaced with the howding inequality, resulting in a final upper bound expression

The final results are as follows:

These are the data set size and the training set result v.s. An expression of the generalization result

Summarize:

Assuming that the set size is m-> infinity, if the number of training set N is limited, we can use a minimum hypothesis set H to replace the previously m-> infinite set of assumptions.

The size of this minimum set of assumptions we call growth function MH (N). Obviously we have MH (n) <=2n for the training data set of size n.

As you can see, in most hypothetical sets, growth function starts at the size of a training set n=k, and all training sets of this size are MH (N) <2n. where MH (k) <2k is defined as break point, and the smallest point is the smallest of break points (which is then referred to as the break point for discussion only). Since the size of MH (N) is obviously related to break point, bounding function B (n,k) is introduced as the upper bound of the growth function.

Through a series of observations and discussions we can finally draw the upper bound of the growth function bounding fucntion B (n,k) has

The upper bound is the upper bound of the growth function. Through this upper bound we can be the original infinite number of hypothetical sets equivalent to an upper bound relationship, and obviously this is a polynomial, can make the previous inequalities convergence.

However, the requirement of the upper bound is that the sample set is limited. Thus, in the actual upper bound vc-bound derivation, the difference between Ein and Eout is substituted by the E ' in gap derived from the same data set in the same size. By these two substitutions, the inequality of the boundary between Ein and Eout is finally obtained for any set of assumptions.

[Tai Machine learning note finishing] As a result of the generalization of the mapping relationship

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More