Supervised learning in machine learning is a very important category, because the main starting point of ML is to use the obtained data to compensate for unknown knowledge, so the pattern law of learning data from the training set is the most natural one. Today, I decided to use about two weeks to record my own machine learning notes, the main reference is Ethen Alpaydin's "Introduction to Machine learning", if there are errors or omissions, but also please children's shoes criticism. Today, mainly to talk about supervised learning, the general points are as follows:
1. An example of supervised learning;
2. The dimension of supervised learning algorithm;
3. Ability to learn algorithms--vc;
4. Determining the sample size of the learning algorithm--probability approximation;
Well, to make a long story short, let's introduce supervised learning in machine learning.
An example of supervised learning of one
From an example is the most easily understood, such as now we have a judgment "home car" task, in particular, according to the car price and engine power two characteristics to judge, in fact, there may be more factors, here for the sake of simplicity we consider only these two characteristics. The task of the algorithm is to learn from the training set and to determine whether a new sample is a "home car". We can mark the car as a positive example (Positive Example), others are labeled negative (negative Example), and class learning is to find a description that contains all the positive examples but does not contain any negative examples.
650) this.width=650; "src=" http://img.blog.csdn.net/20150411173530560 "/>
The above formula describes our example, the two components of vector x represent the car price and the power of the engine, while the vector R is the output, when the positive example output 1, negative example output 0; The first collection represents n sample training sets, each of which is a sample feature x and standard judgment R are composed. Our goal now is to find an algorithm that can be used to find a classification method for all training sets (including all positive examples without any negative examples), and then use this classification method to predict new samples.
Here in the implementation of the time, people tend to have a hypothetical class (hypothesis Class), such as the use of a rectangular set (assuming that in a certain price range and at the same time in an engine power interval of the car is a household car, that is a discriminant), to include all the positive examples, It does not contain any negative examples at the same time. A rectangle that conforms to such conditions may have multiple, so there is a minimum rectangle, the most special hypothesis (most specific hypothesis), such as S, where a positive example is not included in the hypothesis, and there is a general assumption hypothesis), such as G, the larger assumption will contain one or more negative examples. So the assumptions we're looking for should lie between S and G. It is generally thought that it is possible to choose between S and G, as this allows for a larger edge (margin), which is the distance between the boundary and its nearest instance.
Since there are multiple assumptions available between S and g, different assumptions may make different predictions and judgments about the new sample, so this raises the question of generalization (generalization), that is, our assumptions about the accuracy of the classification of future instances that are not in the training set.
Second, the dimension of the supervised learning algorithm
Supervised learning simply means that the computer learns the laws and patterns of the data through the training set, and then classifies and predicts the regression. The presentation of the training set is like the combination of the above X, where the sample should be independent of the same distribution, for the classification, the two classes of learning output is 0 and 1, and K-Class learning is a k-dimensional vector, where only one component is 1, the remaining components are 0, This requirement means that any of the same can belong to only one category at most. For regression, the output is a real value. It is easy to differentiate between classification and regression problems: categorical outputs are discrete values, and regression outputs are sequential values. Let's take a look at the dimensions of supervised learning, the basic steps of supervised learning.
1. Determine the hypothetical class, such as assuming the function model G (x,A),a for a parameter vector, and x for our sample input, we learn from the training set to determine the best a, So that the hypothesis can be judged on the new sample;
2. The assumptions that meet the training set can be many, so we want to choose the most appropriate one, the standard is a loss function L (Loss function), such as L is x and G (x, A) squared difference or absolute value, To represent the difference between our hypothesis and the training set, we seek the smallest one. Of course, the loss function can also have other definitions, but the basic idea is to express the hypothesis and training set of data differences;
3. With the loss of the letter L, then we enter the optimization process, even if the smallest, this step can be implemented in many ways, such as the l to all the characteristics of the partial derivative, determine the minimum value, or use gradient descent, simulated annealing and genetic algorithm.
The difference between different machine learning methods is either to assume that the class is different (assuming the model or inductive bias), or that the loss function used is different, and that the optimization process is different. It can be said that the hypothesis model, the loss measure and the optimization process are the three basic dimensions of machine learning.
Third, the ability of learning algorithm--VC dimension
The ability of the learning algorithm is measured by the VC dimension, which is the number of data points that assume a class hash. Assuming that there are N data points in a dataset, there are 2 different learning problems for both positive and negative examples, and if any of these learning problems can be found in the hypothetical class H , a hypothetical h can be separated from positive and negative examples, we call this hypothesis class H hash these n points. Therefore, the VC dimensional metric hypothesis class learning ability.
Iv. Determination of sample size of learning algorithm--probabilistic approximation
Probabilistic approximation is mainly used for specific hypothetical classes, to determine the minimum number of samples required to ensure that the results of learning to obtain a certain confidence rate, in fact, if we want to achieve a better hypothesis, then the minimum number of training set? Based on our desired confidence rate and different assumptions, we can calculate the minimum sample size of its probabilistic approximation.
OK, today's basic concept is here, tomorrow continue!
Refer:
Introduction to machine learning, Ethen Alpaydin (Turkey), mechanical industry Press
This article is from the "Run Yang Hang" blog, make sure to keep this source http://windhawk.blog.51cto.com/729863/1639483
"Machine Learning" (4): Supervised learning