Reading Notes for statistical learning theory

Last Update:2016-04-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Let's continue the discussion of reading Vapnik ' s book Statistical Learning theory. In the very beginning of the book, Vapnik first described and the fundamental approaches in pattern recognition:the Parametri C Estimation approach and the non-parametric estimation approach. Before introducing the non-parametric approach, which the support vector machine belongs to, Vapnik first address the Foll Owing three beliefs that the philosophy of parametric approach stands for (Page 4 to 5):

The existence of a function defined by a limited number of parameters a good approximation to the desired function;
The normal law for most real-life problems;
The maximum likelihood method is a good tool for estimating parameters

in my opinion, the first condition should is required for All machine learning approaches, no matter whether they is parametric or non-parametric. For the second point, it's based on the central limit theorem, which stated, the distribution of a large set of Inde pendent random variables approximate a Gaussian distribution. If We first pre-process the dataset to normalise the data points to centre the mean into the origin, the Gaussian Distribu tion becomes a normal distribution. The term ' normal law ' is used. In my opinion, it's better to just highly the assumption of independence, since the assumption of independence was Mor E fundamental. The Gaussian distribution, is however, only a special situation of this condition. Regarding to the third point, the statement seems a little too absolute, as maximum likelihood estimation does not stand F or the whole world.

despite we see some limitations in the statements, as long As we keep on reading the book, the author just wanted to use these assumptions that many methods followed to highlight th e Limitations of parametric approaches. We know that expects in parametric learning approach is able to argue the points, as debates is often usual in the AC Ademic World.

then Vapnik started to introduce the Perception algorithm In 1958, and the Empirical risk minimisation (ERM) criterion this used for machine learning. It is of interest to note that the ERM was used to measure the error referring to the training samples and while our real prob Lem of machine learning are the estimate the unobserved behaviours in the test dataset. There is a problem of overfitting, which occurs when the training samples was not enough. The overfitting problem occurs when the training samples are small and thus the model fit the training samples but lack of Generalisation (achieving poor performance for the test dataset).

exactly as what I thought, the next problem the author Addressed is the generalisation of the algorithm. Then the very important VCs dimension theory is mentioned. The basic motivation of the VC dimension relates to density estimation. We know that, due to the law of large numbers, the relative frequency of a event approximates to its real probability, wh En the size of the samples approaches to infinity. However, since our training datasets are always finite in reality. This drives the author to consider constructing a more general theory to estimate the capability of density estimation of A training dataset, the so-call VC dimension. The motivation of support vector machine is, the machine with the lowest VC dimension are the best.

Then the author presents the main principal of designing a learning machine based on a limited size of the dataset. The main principal is this, for density estimation, if we may directly estimate a specific density we needed, rather than Inducting this density by first estimating the more general densities that the specific density depends on.
For example, if we can estimate the condition probability, we could not need to estimate the probability of the condition an d The probability of the event under all conditions. More important, with limited information such as a small size training dataset, we could only allow to estimate a more speci FIC density. On the other hand, the problem we be going to solve was to predict the class of unobservable samples, which requires the M Achine should is able to generate a solution more general than it training dataset or a specific test point. The machine should has the capability to estimate all samples in the feature space.

Reading Notes for statistical learning theory

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Reading Notes for statistical learning theory

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Reading Notes for statistical learning theory

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support