Machine learning Cornerstone (Lin Huntian) Notes of 12 _

Machine learning Cornerstone (Lin Huntian) Notes of 12 __ machine learning

Last Update:2018-08-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Nonlinear Transformation (nonlinear conversion) ReviewIn the 11th lecture, we introduce how to deal with two classification problems through logistic regression, and how to solve multiple classification problems by Ova/ovo decomposition.

Quadratic hypothesesThe two-time hypothetical space linear hypothetical space is extremely flawed:
So far, the machine learning model we have introduced is linear model, that is, the assumption space is linear. The scoring function used in the linear model is a linear fraction. The advantage of the linear model is that the VC dimension can be used in theory. However, when the dataset is linearly inseparable, as shown on the right, it will be difficult to find a line that separates the two types completely, that is, large. Here we discuss how to break this limitation. Circular can be divided into the case:
On the left of the graph, the dataset is linearly inseparable, but the view on the right of the graph shows that a circle can well separate the two types of data, which we call a circular separable (circular separable).
The above dataset uses a circle of radii to well divide two types of data, assuming the function looks like this: The square of the distance from the sample point to the far point is compared to 0.6, less than 0.6 bits +1, and greater than 0.6 to 1. Here we propose a new algorithm model. First, we decompose the circular fractal model above into the form of the familiar linear model:
In the above diagram, we convert a circular, divided dataset into a linear, data set. We refer to the process of converting the input space into a (nonlinear) feature transformation.
Then, in the new data set on the linear can be introduced in the original data and the circle can be divided. First, we need to figure out why the new dataset is linearly divided after the feature transformation.
The 1 formula in the above figure corresponds to the new DataSet Z and the original DataSet x, and 2 is the corresponding relationship between the new assumption function and the original assumption function. Here we look at the difference between the weights and the corresponding hypothetical functions on the original dataset:
It can be concluded that, through such conversions, the linear model on the new dataset corresponds to the two-time curve on the original dataset. If you want to represent all the curves on the original dataset, you need to convert the new space into a larger space, as shown in the following illustration:
After the above feature is converted, we correspond all two curves on the original dataset to the hyperplane on the row's dataset:

Nonlinear TransformA two-time hypothetical space for nonlinear conversion: a good linear model on a Z dataset converted from an X dataset gives a good two-time curve model on the X data set, which is a well two-time hypothetical space. The following figure:
We use the linear model we learned before to study the new dataset after the feature is converted to find the best hypothetical function, you can get the best assumption function in x space:
Non-linear conversion steps:
1. Convert the original DataSet X to the new DataSet Z by feature transformation; The new data set Z is classified using a linear model, and the optimal weight vector value is obtained. 3. Returns the original DataSet X and converts the linear model to its corresponding two-time curve. The flowchart is as follows:
Understanding: The steps to determine which class the sample point belongs to are as below two images, first from left to right, the original data set X into the new DataSet Z, and then based on the linear model of Z to determine what kind of new sample points belong to the original data set, and then to get the sample points in the original data set X category. Summarize:
Non-linear model = feature conversion + linear model. The solution of the above Non-linear model can be extended to three senses, three regression ... Problem.
Price of Nonlinear TransformCost Q-Polynomial conversion for nonlinear transformation
The dimension of the weight vector is analyzed as follows: At this time the dimension of the weight vector is 1+,1, corresponding to the weight of the corresponding linear model in the new DataSet Z. It is a kind of process to extract the K elements from the disordered and repeatable elements in n different element. The given element is, in addition, a 1 element when the dimension is evaluated. Therefore, in the dimension of the request, the equivalent of the d+1 element is repeated in the extract Q element, the dimension is:, further expressed as:. Therefore, both the feature conversion and the lake experiment have consumed a higher time and space complexity, when the Q is very large, the calculation of the model becomes quite complex. In addition: from the previous knowledge, the model parameters of the degree of freedom and the model VC dimension approximation, the following figure:

So, when Q is big, it's big. When we are used to approximate, when very small, can be satisfied, but not satisfied, in the smaller, the assumption that the function is smaller, the selection of the algorithm becomes smaller, may not be judged by the hypothesis. When very small, can not meet, the probability of a bad situation increases, but meet, in the larger, assuming the function becomes larger, the choice of the algorithm becomes larger, the assumption of greater probability.
Further expressed as the following figure: The left figure represents the use of the original linear model, the right figure represents a four-function division, visually get the right image, but may have been fitted, resulting in, not satisfied.
Structured hypothesis SetsHypothetical space for different dimensions of structured hypothetical space:
To be made up of:
Then:
The higher the dimension, the smaller the nature, where the relationship between VC dimension and error rate is shown in the figure above. Conclusion: If you choose the high order polynomial, it may be very small, but very large, and then suggest that in the long term conversion can be first from the lower to higher test, preferably from the first-time.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machine learning Cornerstone (Lin Huntian) Notes of 12 __ machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Machine learning Cornerstone (Lin Huntian) Notes of 12 __ machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support