Today we will start learning pattern recognition and machine learning (PRML). Chapter 1.1 describes how to fit a polynomial curve (polynomial curve fitting)

Source: Internet
Author: User

Reprinted please indicate Source Address: http://www.cnblogs.com/xbinworld/archive/2013/04/21/3034300.html

 

Pattern Recognition and machine learning (PRML) book learning, Chapter 1.1, introduces polynomial curve fitting)

The doctor is almost finished. He will graduate next year and start preparing for graduation this year. He feels that he has done a lot of research on machine learning, and his thesis has been published, but he is not systematic. I am determined to make up my basic knowledge before graduation. I believe it is very difficult to leave college. I used to get started with machine learning by reading the elements of statistic learning. (I have been reading this book for half a year. I am very tired. I have heard that many foreign schools use the PRML book as a teaching material. I have been turning over the tool book and have not carefully read it. So I plan to read the PRML book.

Try to write the content you want to read into the blog. I plan to write more intensive chapters as a basic review. The topics below may be selected as appropriate. Some of the following texts are translated using sentences in the original book, but many of them are my own words. After all, I am not translating them. The formulas and icons in the text are basically from the original book, after all, it takes too much time to write my own blog.

Chapter 1.1 polynomial curve fitting)

Many ml introductions start with the regression (regression) issue, and the same applies to PRML. Regression is the basic solution to many practical problems, such as various prediction problems. However, the first one in this book does not involve specificAlgorithmInstead of linear regression, I have introduced many basic concepts. I think this is a major advantage of this book.

Given a training set of N samples, corresponding to the target value (observed value)

Our goal is to predict the target value of a new sample in a given training set. In fact, we do not know whether the data itself is produced by some form of function equations, such as the forward-xuan function sin (), but it is very difficult to find the implicit equation from the observed data. Because the data itself has uncertainty (uncertainty ). (Note: the understanding of uncertainty runs through this book .)

First, we try to use polynomial functions to fit the data in the form:

Y indicates the predicted value, X indicates the input, and W indicates the parameter. M is the order of the polynomial (maximum power ). Note that although polynomials are non-linear for X, W is linear for all parameters. The linear equation is M = 1. This linear function for W is called a linear model. The linear model will be introduced in chapter 3-4.

An intuitive method of fitting is to minimize the error function. One of the most common error functions is sum of the squares:

This formula is non-negative. error is 0 only when all vertices are correctly predicted. In the formula 1/2, it is convenient for future derivation. (Yes, you are right. In most cases, remove 2--after seeking for Guidance --).

Figure 1.3 illustrates the value of the error function. In the figure, the blue point is the training data, Y is the prediction model, and the sum of squares (halved) of the green line length is the value of the error function. To estimate the optimal value of W to obtain the Function Y (x, W) for prediction, we need to minimize E (w) Because E (W) is a quadratic equation about w, there is a global optimal solution. Here we can record w as the optimal solution. Well, now the last question is how to determine M, that is, how to fit the data with polynomials of certain order?

Let's take a look at figure 1.4. If the data itself is obtained by adding a bit of Gaussian noise (the blue point in the figure) to the Xuan function, of course we don't know how the data comes. We use Polynomials to fit the data. We can see that when m increases from 0, the function's fitting capability is getting stronger and stronger, that is to say, the error results in the training set are getting smaller and smaller, when M = 9, we found that all the points were correctly fitted. So did we find the best result? Otherwise, fitting the training set is just our method. Our goal is to get a function with good generalization ability to predict new data points. From M = 9 in Figure 1.4, we can see that the difference is far from the real curve (Green Line). Here is an over-fitting problem ), it can be said that it is a very important issue in machine learning.

Root-mean-square error

We can see the example in Figure 1.5. After M reaches a certain stage, the error on the test data will increase significantly. We understand it as overfitting!

Let's briefly discuss over-fitting. There are many factors that cause over-fitting. I personally understand that there are two important factors. (1) The model is too complex; (2) the data is too small.

The above example shows that when we use M = 9, the model is too complicated for raw data, but it can also be used because the data points are too small to cause overfitting. As shown in Figure 1.6, when the number of data points increases significantly, the over-fitting problem is much reduced! Therefore, it is important to have enough training samples and appropriate models. However, if there are too many training samples and there are other problems, such as underfitting and not moving (--), we will not talk about them here.

One of the most important aspects of machine learning is regularization and regularization, which will be detailed in subsequent chapters. Here is an intuitive understanding. The most common regularization item is the model of the constraint parameter. The following formula is used to constrain W:

If y is a linear equation, the formula (1.4) is ridge regression. In Figure 1.7, we can see that the changed value can have a huge impact on the model. When M = 9 is still used, it can be better fitted by adding it to the regularization item. Of course, a new problem arises. How can we determine the newly introduced parameters? This requires some other knowledge. A common practice is to create a validation set to verify the selection of parameters. We will talk about model selection later.

Well, the first section will introduce you here. The style of the first chapter is to intuitively introduce some problems and provide some solutions. It is very good for beginners of machine learning.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.