Machine learning Cornerstone Five notes

Source: Internet
Author: User

The first four shows that the machine can be learned under the condition that the hypothesis set size (M) is limited. The purpose of the five is to solve the problem of whether the machine can learn when M is infinitely large.

Why can machines be learned under the assumption that the set size (M) is limited?

1. It is based on the hoeffding inequality:

This inequality shows that the difference between the error rate ein (g) of the training set and the error rate of the test set Eout (g) is too large (>ε) the probability of this happening is upper bound, which is determined by M, ε, and N (sample size).

The origin of this inequality is in the fourth lecture. If there is at least one assumption that a data is for the whole of the hypothesis set, G makes the gap between Ein (g) and eout (g) greater than ε, then the data is considered to be bad. For a hypothetical H, p[| Ein (h)-eout (h) | >ε]≤2exp ( -2ε2n). Because the size of the assumed set is limited and is set to M, there is p[| Ein (g)-eout (g) | >ε]≤2mexp ( -2Ε2N)

The upper type is only limited to M when set. When M is infinitely large, although the assumptions are different, the error rates between them may be similar, i.e. there are overlapping parts. For example, in PLA, two very close lines may divide the data into two similar categories. Therefore, it is too big to estimate the upper bound, ignoring the overlapping of error rate between different assumptions. Therefore, I hope to find a number M (<<m) and re-estimate this upper bound.

How to find M instead of M?

2. Consider the example of PLA. Although the assumption set of the PLA is all straight lines on the R2, there are limited classifications for an infinite number of straight lines. The classification is based on how the line classifies the data. All the lines that give the same classification to the data are grouped into one class. Thus, if there are n data points, there are up to 2N straight lines (because PLA only divides it into two categories).

3. Extend the PLA by dividing the assumption set H according to the different output space. Suppose H is a set of all assumptions, H is an element in H, and each of the points in the input space (x1, x2, ..., xn) is divided into one of two categories, i.e.

It is said that all the assumptions that classify the same data into the same class are composed of dichotomy. For example, there is an H to all 4 points of the data as a circle, and the other number of the same 4 points are all sentenced to circle, then the two H is divided in the same dichotomy. For h, the upper bound of the number of Dicho that it contains is 2N (a point can only belong to one of two classes). Since the number of dichotomy depends on the dimension of the input data, example: if the dimension of the data is 4, then there are at most 24 dichotomy, so in order to remove the dependency on the input, define the growth function (growth functions):

That is, to take n points in the input space X, so that the number of dichotomy is the largest. For example, in the PLA, when taking 1 points, the number of dichotomy is 2, 2 points is 4, and 3 points is 8 (if 3 points are collinear, there are only 6). The implication is that, according to the N points entered, H can award these n points to the maximum number of different types of combinations.

5. If there are n points, the dichotomy based on these n points contains all the possible dichotomy of H, then the N points are said to shatter H. For example, when there is only one point K, there are only two kinds of dichotomy, namely forks and loops. And H is just the assumption that divides this point into forks, and the assumption that this point is divided into circles, which is called shatter H.

6. Break points: The first point where all dichotomy cannot be made. So MH (k) < 2k, can solve the first one can not make all dichotomy K. Growth rate MH (N) = O (Nk-1) when there is a break point.

Machine learning Cornerstone Five notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.