Machine Learning Cornerstone IV Lecture Notes

Last Update:2014-12-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The question of whether machine learning is feasible is introduced in the Forth.

1. From the given data d, it is feasible to find a hypothetical G close to the target F. Like PLA. However, it is hard to say whether the found G can be used in places other than D.

2. Hoeffding's inequality answered whether G could be used for problems other than D:

(1) in probability theory, hoeffding ' s inequality provides a upper bound on the probability that the sum of the random variabl Es deviates from its expected value.

(2) Think of all possible input x as a jar, and each ball in the jar represents an input data point x. For the found one hypothesis H and the target F, if H (x) ≠f(x), the x is painted orange, and if H (x) = f (x), the x is painted green. Because there are many ball xin the jar x , it is not possible to directly get the ratio of the orange ball, so extract n balls from the jar as a sample to estimate the percentage of orange balls in the whole jar. It is known from the hoeffding inequality that when n is large enough, the gap between the orange ball ratio in the sample and the orange bulb in the jar is upper bound.

(3) for a given h, the error rate of h in the sample is Ein (h), and the error rate in the entire input space is eout (h), by the hoeffding inequality, p[| Ein (h)-eout (h) | >ε]≤2exp ( -2ε2n). Therefore, Eout (h) is not required to know. When Ein (h) ≈eout (h) and Ein (h) are very small, it can be said that Eout (h) is very small and h is probably very close to F.

3. The above gives a way to verify that a certain h is close to F, but it is still not learning. The real learning is to make choices from a bunch of assumptions, not to give the same h each time. PLA, for example, learns from different materials and gets different lines, rather than getting the same line. If an algorithm always gives the same h, then the algorithm is probably useless and cannot be learned.

4. When there are many assumptions, you can imagine each different h to paint the ball in a jar in a different color:

It is possible to choose the assumption that H's Ein is small, but the Ein is very small h, which may be accidental. Example: Toss a coin 5 times, 5 times are positive probability is very small. But toss 50 coins, each coin toss 5 times, one of the coins 5 times is positive probability is very big. The hoeffding inequality shows that Ein and eout differ very little when there is only one H. It is said that Ein and eout differ greatly in bad events. If a certain piece of data makes the Ein and eout of an H very large, the data is called bad. The hoeffding inequality shows that for a certain h, the upper bound of the probability of a bad is 2exp ( -2ε2n). If there is at least one assumption of bad for a given set of data, it is considered bad for the entire set of assumptions,

From above, when the assumed set size is limited, the probability that the data is bad will still have an upper bound, so long as n is large enough to ensure that the Ein is approximately equal to eout. If algorithm a can find a small hypothesis about ein, it can be thought that the machine learns something.

Machine Learning Cornerstone IV Lecture Notes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine Learning Cornerstone IV Lecture Notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine Learning Cornerstone IV Lecture Notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support