"Neural Network and deep learning" article Three: sigmoid neurons

Source: Internet
Author: User

Source: Michael Nielsen's "Neural Network and Deep leraning", click the end of "read the original" To view the original English.

This section translator: Hit Scir master Xu Wei (https://github.com/memeda)

Statement: We will be in every Monday, Thursday, Sunday regularly serialized the Chinese translation of the book, if you need to reprint please contact [email protected], without authorization shall not be reproduced.

"This article is reproduced from" hit SCIR "public number, reprint has obtained consent. "

    1. Using neural networks to recognize handwritten numbers

      • Perception Machine

      • Sigmoid neurons

      • Structure of neural networks

      • Using simple network structure to solve handwritten digit recognition

      • Learning parameters by gradient descent method

      • Implement our network to classify numbers

      • About Deep Learning

    2. How the inverse propagation algorithm works

    3. Learning method of improving neural network

    4. Neural network can calculate visual proof of arbitrary function

    5. Why the training of deep neural networks is difficult

    6. Deep learning

Sigmoid neurons

The Learning algorithm sounds good, but the question is: how do we tailor a learning algorithm for neural networks? Now suppose there is a network of perceptual agencies, and we want to make this network learn how to solve some problems. For example, for a network that takes the raw pixel data of a scanned image of handwritten numbers as an input, we want this network to learn the weights (weights) and offsets (biases) so that the numbers are eventually sorted correctly. To illustrate how the learning algorithm works, let's start by assuming a small change in some weights (or offsets) on the network. The result we expect is that these small changes in weight will bring about a corresponding change in the output of the network, and this change must be minor. As we will see later, satisfying such a nature can make learning possible. The image below reflects the results we want (the network is very simple, of course, and it cannot be used for handwritten digit recognition).

If a small change in the weight (or offset) is satisfied only to cause a slight change in the output, then based on this nature, we can change the weights and offsets to make the network's performance more and more close to what we expected. For example, suppose the original network would incorrectly classify a handwritten digital picture written with "9" as "8". We can try to find a correct way to change the weights and offsets slightly so that the output of our network is closer to the correct answer-classifying the image as "9". Repeat this process to constantly modify weights and offsets and produce better results. So our network began to learn.

The problem, however, is that when our network contains a perceptual machine, it is different from the described above. In fact, a slight change in the weights or offsets of any of the perceptron in the network can sometimes even cause the output of the perceptron to completely flip--for example, from 0 to 1. This flip behavior can radically change the behavior of the rest of the network in a very complex way. So even now that "9" is correctly categorized, the behavior of the network in dealing with all other images can be radically altered by some uncontrollable means. This leads us to gradually change the weights and offsets to make the network behavior closer to the expected learning method becomes difficult to implement. There may be some clever ways to avoid this problem, but the learning algorithm is not obvious to this network of perceptual agencies.

Thus we introduce a kind of called S-type (sigmoid, usually we are more accustomed to use its English name, so the rest of this article will also use the original English) neuron of the new artificial neurons to solve this problem. Sigmoid neurons are somewhat similar to perceptron, but some modifications make it possible for us to change the weights and offsets slightly, only to cause a small amplitude of output to change. This is a key factor in enabling networks of sigmoid neurons to learn.

OK, let's start with the main character of this section. We will use the perceptual machine to describe the sigmoid neurons.

Like the Perceptron, sigmoid neurons also have input, x1,x2,? , but the difference is that these input values are not only 0 or 1, but can be taken from 0 to 1 of any floating-point value. So, for example, 0.638 ... For sigmoid neurons is a legal input. Similarly, sigmoid neurons have corresponding weights for each input, w1,w2,..., and a whole offset, B. However, the output of the sigmoid neuron is no longer 0 or 1, but σ (w⋅x+b), where σ is called the sigmoid function (sigmoid functions)1, the function is defined as follows:

Summing up the above, more accurate definition, the output of sigmoid neuron is about input x1,x2,..., weight w1,w2,..., and offset B function:

At first glance, sigmoid neurons are very different from perceptual machines. If you are unfamiliar with it, the algebraic form of the sigmoid function may seem obscure. But in fact, sigmoid neurons have much in common with perceptual machines. The algebraic formula of the sigmoid function is more of an expression of its technical details than an obstacle to understanding it.

To understand the similarity between sigmoid neurons and the perceptron model, we assume that Z≡w⋅x+b is a large positive number. At this time e−z≈0 and σ (z) ≈1. That is, when the z=w⋅x+b is a large positive number, the output of the sigmoid neuron is close to 1, similar to that of a perceptual machine. On the other hand, when z=w⋅x+b is a large negative value, e−z→∞ and σ (z) ≈0. So when Z=w⋅x+b is a large negative value, the behavior of sigmoid neurons is similar to that of a perceptron. Only when the w⋅x+b is a not too large number, the result and the perceptual machine model have a large deviation.

We cannot help but ask, what does the algebraic formula of σ mean? How are we to understand it? In fact, the exact form of σ is not so important-for us to understand the problem, what really matters is the way the function is drawn on the axis. Represents the shape of it:

This shape can be thought of as a smooth version of the step function shown:

If the Σ function is replaced by a ladder function, then the sigmoid neuron becomes a perceptron, because its output changes only with the positive and negative difference of w⋅x+b and only 1 or 0 of the two discrete values 2. So as I said earlier, when we use the Σ function we get a smooth perception machine. Moreover, the smoothing property of the Σ function is the key, and it is not too concerned with its specific algebraic form. The smoothing property of the Σ function means that when we make a slight change in the weight and offset of a value of δwj,δb, the output of the neuron will change only slightly δoutput. In fact, by the knowledge of calculus, Δoutput approximates:

The summation operation is to add all the weights WJ, and ∂OUTPUT/∂WJ and ∂output/∂b indicate that the output is biased to WJ and B respectively. If you look at partial differential unhappy, do not panic, although the above formula seems a bit complicated, but in fact, the partial differential is very simple (good news ~): Δoutput is about the weight and offset of the change δwj and δb linear function (linear function). This linear property makes it easy to change the selection weights and offsets slightly and make the output smaller as expected. From the above, sigmoid neurons not only have many similar properties to the Perceptron, but also make it easier to describe how the output changes with weight and shift.

If it is really only the shape of σ that works and its specific algebraic form is useless, why does the formula (3) represent Σ as this particular form? In fact, in the later part of the book we also occasionally mention some neurons that use other activation functions (activation function) F ( w⋅x+b) in output F. When we use other different activation functions, the main change is the specific value of partial differential in the formula (5). Before we need to calculate these partial differential values, using σ will simplify the algebraic form, because exponential functions have good properties when it comes to differentiation. In any case, Σ is most commonly used in neural network work and is the most frequent activation function in this book.

How do we explain the output of sigmoid neurons? Obviously, a huge difference between the perceptron and the sigmoid neurons is that the sigmoid neurons not only output 0 or 1, but 0 to 1 arbitrary real numbers, such as 0.173...,0.689 ... is a legitimate output. This is useful in some examples, such as when we want to use the output value of the neural network as the average gray value of the pixel of the input image. But sometimes this nature is also very annoying. For example, when we want the network output about "the input picture is 9" and "the input picture is not 9" prediction result, obviously the simplest way is to output 0 or 1 as the perceptron. However, in practice we can set a convention to solve this problem, for example, contract any output value greater than or equal to 0.5 for "The input picture is 9", and the other less than 0.5 output value means "the input picture is not 9". When I use a convention like the one above, I'll make it clear, so this doesn't cause any confusion.

1 by the way, Σ is sometimes called the logistic function, and the corresponding new neuron is called the logistic neuron (ogistic neurons). Remember that these terms are useful because many people who engage in neural networks use these terms. But in this book we still use the name sigmoid.

2 in fact, when w⋅x+b=0 the sensor will output 0, but at this time the ladder function output value is 1. So strictly speaking, we need to modify the value of the ladder function at 0. It's good to see this.

Practice
    • sigmoid Neuron Simulation Perceptron (Part One) on a neural network consisting of a perceptron, assuming that all weights and offsets are multiplied by a normal number, c>0 that the behavior of the network does not change.

    • sigmoid Neuron Simulation Perceptron (PART II) assumes the same initial conditions as the above problem-a neural network formed by a perceptual mechanism. Assume that all input to the Perceptron is already selected. We don't need the actual value, just make sure the input is fixed. Assume that the input x for any perceptron in the network satisfies the w⋅x+b≠0. Now replace all the perceptron in the network with the sigmoid neuron, and then multiply all weights and offsets by a normal number c>0. It is proved that the network of sigmoid neurons is the same as the network behavior of perceptual mechanism under the limit condition namely c→∞. And think about why it's not so when you're w⋅x+b=0?

In the next section we will introduce the "structure of neural networks", so stay tuned!

    • "Hit Scir" public number

    • Editorial office: Guo Jiang, Li Jiaqi, Xu June, Li Zhongyang, Hulin Lin

    • Editor of the issue: Hulin Lin

Neural network and deep learning series article three: sigmoid neurons

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.