Machine learning-Perceptual machine implementation (1)

Source: Internet
Author: User

Premise

This series of articles is not intended to be used to study the derivation of mathematical formulae, but to quickly implement the idea of machine learning in code. The main thing is to comb your thoughts.

Perception Machine

The perception machine is to accept the data transmitted by each sensory element (neuron), which will produce corresponding behavior when the data reaches a certain threshold.
For example, there is a corresponding weight for each of the perceptual elements, and when the data reaches the threshold U, it executes the corresponding behavior.

U = w0 + w1x1 + w2x2 +......wnxn

Corresponds to spam processing, when u > 0 o'clock is the normal message. On the other hand, it is junk mail
For such a model it can be called a simple perceptual machine. Which is the basic unit of a neural network.

update of weight vectors
The w1,w2 mentioned above is the one that corresponds to whether each is a measure of spam, and x1,x2 ... is the number of phrases that are monitored in the email.
For example, when X1 and X2 are the same, the larger of the absolute value of W1 and W2 has a greater effect on the result, that is, U. So, we also put w1,w2 .... Called X1,x2. The weight value
Vector is the weight vector

Based on the expected results and forecast results in the training data, the weights can be modified continuously. So specific to the project should be how to modify W?

1. Randomly set a value for W1,W2,... WN
2. Repeat the following steps continuously
* Enter training data and modify it if the result is incorrect
* End the operation when all the training data results are correct
Thought is very simple, then [incorrect modification] this sentence, in the simple perception of the machine can be very simple modification, then in the depth of the neural network, how to calculate it?

Gradient Descent method
First, we introduce a definition [error function (i.e. loss function)]e, i.e. the difference between the output and the expected result.
In order to facilitate the subsequent calculation this definition can be changed to change according to the change of the vector w error function in the direction of the smallest change in the progression
The relationship between W and the error function is as shown

The value of the W at the very bottom of the curve is the desired value for the first calculation, and if you carefully analyze it we know that this calculation is a differential calculation
And this trend is the value of the differential calculation

So the change process of WI is

The simple understanding is that when the trend is negative, WI moves in the right direction, and vice versa. But when the trend is great, WI changes very much,
And when the trend is very small, the change of WI will be very small. Such calculations can make the whole process difficult to converge, so we'll set a smaller positive parameter
To participate in the calculation.

In the above expression P is the rate of study. A positive number that is smaller than 1 is usually set. But if it is too small, it will also make the calculation much more.
This method of correcting weights by means of constant differential correction is the gradient descent method _. If you want to know more about the gradient drop details can refer to my previous [an article]

Then there is the concrete expression of the error function.

The error function of simple perception machine
For a perceptual machine, we use the following formula to express his error function.

E = max (0,-TWX)

Max (A, b) is an arithmetic function that selects a larger value in a, B. T is the sign of right or wrong
t = 1 (normal mail), t=-1 (junk mail)
One of the details here is that spam is judged to be 1 instead of 0.
So why should the error function take the form of Max (0,-TWX)?

Consider the following x1,x2 two-dimensional equation.
For the value of the equation wx (w0 + w1x1 + w2x2). All values above the line wx=0 are undoubtedly 0, and the positive value above the line
Below the line is a negative value. When all the points in the t=1 are in the positive area, and the 1 point is in the negative area, the training ends.
Similar to.

This is the case where learning is not over, and the distinction between B and D is temporarily in the wrong area.


You can focus on the situation of a:
A is a spam message, and now the classification is also correct, then wx>0. That's t=1.

So we can see-TWX =-wx<0. So e = max (0,-TWX) = 0, the result of the error function is 0.

Therefore, for the error function, when the point x is correctly classified, the error value of 0 can be obtained and the opposite is |wx|.

So what does |wx| really mean? The simple understanding is the distance between this point and the straight line wx=0. If I remember correctly, this distance calculation should be high school knowledge.

Realization of perceptual machine algorithm
Updated functions According to the weights described above

For the error function e = max (0,-TWX), returns-TWX when the error is not 0. Let's simply calculate the following.

So you can get the updated formula:

The weight of the whole to express the time as follows:

Based on the above inference, we can get a perceptual machine pseudo-code as follows:

* Set random values for W1,w2....wn
* Enter each training data
* The results obtained from the input training data are consistent with the expected value

* Consistent, perform the next set of operations
* Inconsistent, according to the operation
* Has the value of W changed in the previous set of cyclic operations
* has been changed, then repeat the above cycle
* No change (all values are expected), end of training

I've implemented this pseudo-code in Python. If necessary, refer to [here]

Valve value
_ Hint, this section for this article does not have much help, mainly for the next article, Multilayer awareness machine to do bedding _
For each perceptron is an activation threshold that is capable of performing an action when the parameter reaches the threshold. So how do you determine this threshold?

The threshold value that determines the output value is also known as the activation parameter. In a nutshell, the expression for the activation parameter is the following F (U) =u.
Similar to spam classification we can have the following expressions:

The picture is shown as follows:

In the form of 1 and 1 we used above

Above, if there is doubt welcome the discussion.

Machine learning-Perceptual machine implementation (1)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.