Deep learning the sixth chapter of depth Feedforward network starting from the XOR function

Source: Internet
Author: User

6.1 Starting with the XOR function

To make the concept of feedforward networks more specific, let's start with a simple example, in which we use feedforward networks to solve a simple task: learning the XOR function.

It is well known that XOR (XOR) operations are a two-mesh operator for binary values. When two operands do not return 1, the other returns 0. The XOR function gives us the objective function y=f* (x) We want to learn, and our model provides the function y=f (x;θ), our learning algorithm will fix the parameter θ so that the model function f is as close as possible to the target function f*.

In this case, we are more concerned about whether the network can get the right results on the four points of X={[0,0]t,[0,1]t,[1,0]t,and [1,1]t}, rather than discussing how to extend its generalization capabilities. We will use the four points above to train the network, the main challenge is how to adapt the network to the training set.

We can consider this problem as a regression problem and use the mean square error (MSE) as a loss function. The MSE is used for a simpler mathematical calculation, and in subsequent chapters we will introduce other methods of error measurement that are better suited to binary data.

For our training set, the MSE loss function is represented as follows:

Now, we have to determine the type of the model, because we want to get the concrete form of f (x;θ). If we choose a linear model for the moment, it can be represented as:





According to the training data, solving the normal equation (normal equations) is easy to get w=0 and B=1/2.

(if you and bloggers like the linear algebra are returned to the teacher, please move to the origin of normal equations to understand the normal equations. )

But the problem arises, and the model we're going to get, the output is always 0.5. Why is this so? Figure 6.1 illustrates why a linear model cannot effectively express an XOR function. One way to solve this problem is to use a model to learn a different representation space, so that in that space, we can apply a linear model to explain the problem.






In particular, we will introduce a very simple feedforward network, which has only one hidden layer and contains only two hide units. See Figure 6.2. In this network, the input layer data through the function f (1) (X;W,C) to obtain the value of the hidden layer elements, which constitute the vector h, and H will be the second layer (that is, the output layer) input. As you can see, the output layer is still a linear regression model, but it is calculated as h instead of the previous x. The network now consists of two functions: H=f (1) (x;w,c) and Y=f (2) (H;W,B), which can be represented as: F (x;w,c,w,b) =f (2) (f (1) (x)).




So, the function f (1) What does it look like? So far, our use of the linear model is pro, so we would take it for granted that we could try to set F (1) to linear as well. Unfortunately, if f (1) is linear, then the entire Feedforward network will be linear, because if we temporarily omit the intercept term in the function, there will be: F (1) =wtx and F (2) =ht W. Then f (x) =wTw Tx, which is obviously linear.

Therefore, we make it clear that the description function of the feature must be non-linear. Most neural networks use activation functions to address this problem, which is a constant, non-linear function. The final affine transformation can be accomplished by the parameters that it mates with the learning. Here we also use this strategy, we define H=G (Wtx+c), where W is the weight of the linear transformation, and C is called biasing. Now, we describe the affine transformation from X to H, the activation function g is the element level, Hi=g (XTW:,I+CI). Today, most of US default g is the Relu function (rectifiedlinear unit, fixed linear unit), in which case G (z) =max{0,z } (This is the Relu function, is not fried chicken simple) See Figure 6.3




We can now define the network as a complete form:





Here, we point out a set of solutions to the XOR problem:





Which b=0.



For:





Our network is able to get the right solution:




The reader is self-validating.


In this case, we are only pointing out a set of solutions to this problem, and this solution can achieve a 0 error score. But in the real world there may be millions of data that we need to deal with, and with the same number of parameters that we need to adjust, the solution parameter will never be as simple as solving the XOR problem. At this point, an optimization algorithm based on gradient descent is available, and it can solve the values that are very close to ground truth.


Deep learning the sixth chapter of depth Feedforward network starting from the XOR function

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.