Caltech Open Course: machine learning and Data Mining _ Linear Model

Source: Internet
Author: User

This lesson mainly describes the processing of linear models.

Including:

1. Input Representation)

2. Linear Classification)

3. Linear Regression)

4. nonlinear transformation)


The author believes that to test the availability of a model, it is to use real data to do a good job.

To explain how to apply linear models, the author uses linear models to solve the problem of post office data identification:

Because different people have different writing habits, their writing forms may vary for the same number, but their basic skeleton will not change much. The following figure shows the data from the post office and from the Real World:


Some numbers in the data are easy to identify, but some data is hard to identify, such as whether the last number in the last row is '8' or '0 '?

To identify these numbers, we need machine learning. Here, the author uses the sensor model (Linear Model) to solve this problem.

The first point to be done is how to input data (input representation )?

Assume that each number pixel is 16*16,

Then we can enter 16x16 = 256 vertices for each image. We have 256 + 1 parameter: x0 -- x256. We hope to use machine learning to determine the values of all these parameters. However, with so many parameters, machine learning may take a lot of time to complete, and the effect is not necessarily good. We can see that some pixels are not needed, so we should extract some features from these pixels to represent each image, so that we can reduce the dimension, this reduces the learning intensity. It is observed that the intensity (black points) and symmetry of different numbers are different. Therefore, we try to describe each image with the intensity and symmetry, that is, extract the features, to reduce the dimension from 257 to three dimensions (do not forget w0 ). Of course, we may lose the same amount of useful information while extracting useful information. After feature extraction, each image is represented as follows:


Now we can use these features for machine learning. (Linear classification)(To simplify the problem, only the numbers '1' and '5' are processed '). According to the linear model formula: h (x) = WX (w t are vectors ). The following describes how to learn.

The left side of each graph is the process and result of learning using a linear model. The red lines in the left graph represent the actual results, and the green lines represent the learning results. Because the data is not linearly segmented, the algorithm does not automatically stop. Therefore, it is defined here to force stop after 1000 iterations. The last iteration result is used as the final output. However, the learning effect is not very satisfactory, and the split line is too high. This is because the result of the last iteration is not necessarily the final one. It is possible that the iteration will stop when it is just a bad data iteration. Therefore, the following figure shows the result of the pocket algorithm. The result is very good, however, there are still some points that cannot be correctly classified, but they are not required to be completely correct. The main error is within the permitted range.

After talking about this, what is the pocket algorithm?

Pocket AlgorithmIt is to add a tracker, which is the best final result, and record it. When the next iteration ends, the current result of the tracker is compared with the result of the iteration, keep track of the final results. In this way, at the end of the iteration, the tracker retains the iteration completed in all iterations. But why is it called the pocket algorithm instead of anything else? The author explained: imagine that every time we put the best results in our pockets, when we find better results, put it in your pocket and discard the original results. When the algorithm ends, you just need to take out the results in your pocket. This is called the pocket algorithm. (HA, vivid)

Next I will talk about linear regression.

Regression = real-valued output

When we mention whether some variables are related to other variables, we think of regression.

Using linear regression, we can find a line (or plane, which is determined by the number of parameters (dimension) to minimize the error.


2d: Line 3D: Surface

The linear regression equation is as follows:

But how should we measure the error? After all, regression is used to find a function and minimize the error between the function and the real function. The author uses the standard regression error calculation function: squared difference.


To obtain the minimum value of the error function and the corresponding W, we need to evaluate: Standard matrix calculus:


If XW = Y, the solution is as follows: on the left side of the equation, the inverse of X is multiplied, and then the inverse of xtx is multiplied. the plus sign on X is called the pseudo reciprocal.


Now we have found W. But how can we use linear regression for classification? Because binary results also belong to real numbers, linear regression can be used for classification. When the result is true, we treat it as + 1 class. If the result is negative, it is considered as-1 class.


The question is, can linear regression be used for classification? First read one:


The blue part belongs to the + 1 class, and the red part belongs to the-1 class. The greater the integrity, the smaller the integrity. But the regression equation tries to change all negative numbers to-1, and all positive numbers to + 1, but it cannot. For those-2, for the value of-3, it can only treat it as an error. Therefore, linear regression can only minimize errors, but it is not a real classification! What? Not a real category? Are you kidding me?

Not in a hurry. Although it is not a real classification, we are half done, if we initialize W to the W calculated by linear regression and continue to use the pocket algorithm and other iterative operations, the whole process will be accelerated. After all, if we rely on the pocket algorithm at the beginning, it may take a long iteration to reach the value we need. So why don't we make a jump? This is the combination of linear regression and classification.

(Although this is the case, I don't quite understand why the negative number cannot be changed to-1. Why do I treat-2 as an error ???...)


Finally, the author explains how to convert a non-linear problem into a linear problem.

In this section, the author does not propose a theoretical method to guide conversion, which will be mentioned in the next lesson. In this lesson, the author puts forward an example and uses the conversion method to successfully convert the nonlinear severalproblem to the linear severalproblem, as shown below:

The left side is the original dataset, which is obviously non-linear, but if we process the input, the values of X1 and X2 are the same square, this is equivalent to ing the space on the left to the space on the right. In this way, the nonlinear severalproblem is successfully converted into a linear severalproblem. Note that, when a new X is input, we need to use the same method to convert it first.



Conclusion:

Linear model is one of the most important models in machine learning. Many models are built on Linear Model.

When the input is multidimensional, we need to process the input representation to reduce the learning difficulty and time, extract features, and reduce the dimension, thus reducing the cost of machine learning.

To speed up the process and quality of machine learning, we can use the linear regression method to find a w as the initial value of the linear model, and then iterate on the initial value.

In reality, data with many problems is not strictly linearly segmented. If there is an acceptable boundary (only the decimal point is incorrectly classified), we cannot converge the algorithm for this type of problem, but we can still use a linear model to solve it, we only need to limit the number of iterations, and use the pocket algorithm to find the best result in the iteration process as the final result.

In addition, we can process the input data to convert the input data to linear severable problems, and then use the linear model to solve these problems.



Caltech Open Course: machine learning and Data Mining _ Linear Model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.