Statistical Learning Method notes <Chapter 2 perception machine>

Source: Internet
Author: User

Chapter 2 perception Machine

I feel that the perception machine is still very simple. Just write it.

Perceptron is a binary linear classifier.

Input X indicates the feature vector of the instance, and output y indicates the instance category, which is represented by the following function:

  

Where W is the weight (weight) or the weight vector (weight vector), B indicates the bias (bias), sign is the symbol function, and the content greater than 0 is 1, otherwise it is-1.

The sensor belongs to the discriminant model (the ing function from input to output is directly searched, and does not care about the joint probability or something ).

Explanation of the sensor: WX + B = 0 corresponds to a hyperplane s in the feature space (the hyperplane is represented as a line in two dimensions and the plane in three dimensions, I don't know if it's a ghost. It's called a superplane. It's more than a plane ). W is the normal vector, and B is the intercept (it can be understood from a straight line in the European space ).

Linear division and linear division: that is, for a dataset, if such a hyperplane wx + B = 0 can be found, the positive and negative instance points can be completely divided to both sides of the superplane, it is called linear differentiation, otherwise it cannot be divided.

The learning goal of a sensor is to find a super plane that can completely correctly classify Positive and Negative instance points (this does not mean that the sensor can only be used on Linearly partitioned data sets (t_t )), then... find the matching W and B. The learning strategy is to minimize the loss function.

What is the loss function? We often first think of the number of misclassification! I feel good, but the number seems to be of little help to the adjustment of W and B. How can I adjust W and B to reduce the number of misclassified items? It seems difficult, seriously, this loss function is not a continuous bootable function of W and B. So, the wise ancestor thought of the total distance from the mistaken classification point to the ultra-plane, so it's easy to forget that the total distance was written as this thing:

  

Why is there a negative number, because for mistakenly classified data (x, y), there is-y (wx + B)> 0 (Buddha said: Too lazy to say ). Then there is a loss function (proving something to die ):

  

Then the loss function is minimized (-_-zzz ):

  

The perception machine learning algorithm is drive by mistake (the word "driven" sounds very powerful), and the Stochastic Gradient Descent Method (which will be written later ), evaluate the skewness for W and B respectively:

  

After the request is completed, it will be updated (I feel like there is nothing to say here, and I will see it at a Glance ):

  

What I really want to talk about with N is the learning rate, which increases the speed of the parameter, but is easy to run too much. It is easy to run too slowly, but it is easy to be too slow.

Then there is the complete algorithm (too lazy to hit, directly steal the image ):

Then there are various examples behind the book. Proof of algorithm convergence, dual form or something. If you are too lazy to write, you can understand this.

Well, this is probably the case. This chapter is relatively simple. First try the water.

Statistical Learning Method notes <Chapter 2 perception machine>

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.