First knowledge of Neural Networks

Source: Internet
Author: User
Order:

This series is based on the neuralnetwork and deep learning book, and I have written my own insights. I wrote this series for the first time. What's wrong! Next, we will introduce neural networks so that you can understand what neural networks are. For better learning, we will be guided by identification numbers later. Let's study it step by step!

Let's talk about some of them first! Sometimes do you think that human visual systems are a great masterpiece in this world? See the figure below:

When we see these numbers, we can immediately know what these numbers are. Why can we understand them so quickly? Can we recognize them by nature? I think the answer is no. The neural network of our brain is a very complex system (please, this is a product that has evolved over billions of years !), With the help of such a great and magical system, we were able to quickly recognize the numbers above. Next, we encountered the problem of how to let computers identify these numbers. Didn't we always advocate digitalization and informatization? I think everyone can think of the efficacy of recognizing these numbers! So, how to do it? Maybe you will say that the numbers above are relatively simple, just a few pictures! So I will say, students, please refer to the figure below:

Sorry! So many, sometimes, when there are some numbers that are relatively easy to write, people may not be able to recognize these numbers immediately, but we temporarily ignore numbers that cannot be recognized by anyone!

So how do we get started! Hey hey, our leading role-neural network! I think you must have heard about neural networks more or less. But let's give a brief introduction! First, neural networks are a means of machine learning. For machine learning, people who know something about it know that they enter data and train them to get a network diagram, then the input data is output. When we were young, the teacher told us that the letter is a, B, c... Then, we gradually learned about these letters. When we give us some more letters, we will understand the meaning of these letters.

Perceptrons:

What is a neural network? Let's first introduce perceptrons, which is an artificial neuron. (well, is there a translation failure? Sorry, if you have a better translation, please let me know ). Perceptrons was developed by Frank Rosenblatt, a researcher named Warren McCulloch and Walter Pitts between 1960 s and S. Today, more neurons have been proposed, such as sigmoid neurons (which we will introduce later). These neurons are already quite common in many models. In short, it is very hot. If you don't understand it, it will appear out, so you need to understand it! Perceptrons is relatively basic, so let's start with the basic introduction!

Well, there is too much nonsense. I think you should have learned some biology and have a little understanding of the way neural cells work, the method for transmitting signals between neurons is to release some signals from the front neurons to the back neurons. After the neurons are processed, they are passed to the next one. What I said is too simple, let's take a look at how our artificial neurons work. How does perceptrons work?

Perceptrons can process some binary input and then give you a single binary output.

It's better to speak with images. Let's see it clearly. Input x1, x2, and X3, and then generate an output. How can I get the output? This person, Rosenblatt, proposed a simple rule for computing output. He gives each input a weight, W1, W2 ,..., The weight can be used to indicate the importance of the Input. (In fact, it is straightforward. Well, this input is good. Set it to 0.8. Well, this is also good, set a value of 0.7. Set a value of 0.1 next time.) These values will affect the output result. The greater the weight, the greater the impact on the result (female: "My mom and I fell into the water. Which one do you save?" Male: "It depends on your weight ). How can the output be obtained? As you can see, set a value of threshhold, and calculate ΣJwjxj,If the result is greater than the set value, output 1; otherwise, output 0.

The expression of the formula should be clear. Next, let's use a good model to see how to use this neuron.

Suppose you want to go to an open-air concert, but there are some factors that need to be tested. For example, how is the weather on the first day? 2 will your girlfriend or boyfriend accompany you? 3. What is the ticket fee?

Let's use x1, x2, X3 for representation, X1 = 1 for good weather, X1 = 0 for bad weather, X2 = 1, it indicates that your friend of the opposite sex (the same sex can also be) is willing to accompany you, X2 = 0 indicates not willing; X3 = 1 indicates that the Ticket Fee is low, X3 = 0 indicates that the Ticket Fee is high.

Well, after setting the meaning of the input, the following figure shows how much you pay attention to each factor. For example, if you are very important to the weather, you can set W1 = 6, and you are not very important to the second and third input, Then W2 = 2 and W3 = 3, as mentioned above, the weight is the importance you attach to these inputs. The larger the weight, the more important the input is. Finally, we need to set a threshhold, which can be set to 5, but you can also set this to 3, indicating that you would like to attend this concert. The final output means that 1 means to go, and 0 means not to go. Well, we used perceptrons to describe the concert event.

See the following figure:

Is there something different from the one we described above? There are some neurons in the middle. In fact, a complicated neural network has many layers. We call the leftmost layer the input layer, the middle is called the hidden layer, and the rightmost is called the output layer. We will see more complex neural networks later.

The above output function has a threshhold value. Here we will make a slight modification and program the following.

Compared with formula (1), we can see that threshhold has been moved to the left. Here, B =-threshhold. You can call it bias. (It is better to save translation. Remember English, more original ).

If you see this, will you think the neurons above are a little rigid? You have to set the values by yourself. It's a lot of trouble. The neurons should be able to set their own values. Well, that's the right idea, neural Networks should be updated by themselves. Next let's take a look at another neuron, which will be updated by ourselves!

 

Sigmoid neurons:

Suppose we need a neuron capable of learning by ourselves. The above obviously cannot meet our requirements. What should we do? For example, if our input is a scanned image matrix with numbers written by humans? What should we do in this situation?

Take a look at the following figure:

If the input weights or bias changes, the output changes a bit, we can use this information to modify the weights and bias, make the network more in line with what we want. Assume that our neural network model mistakenly reads "8" into "9" and modifies weight and bias so that it can read numbers correctly.

However, if the output is only 0 and 1, the changes we make may not have much impact on the results, that is, it is difficult to change the results from 0 to 1, even if we have modified the neural network model so that it can read "9" correctly, the weights and bias changed in the middle may be huge. In this case, for other numbers, the same problem may occur, causing the final model to change, and each change is not small. Maybe you don't think it is troublesome, but you are smart. Please think about it carefully. If these changes consume time and resources, will the huge changes consume resources and time, we should stop tossing computers like this, because we can use better models for representation.

Back to this figure, let's make some changes. We should also deal with some input. We get an output, but this time our output is no longer as simple as 0 and 1, let's output any value between 0 and 1. We introduce a function,σ(W?X+B) What is this introduction?σAs a sigmoid function, the function is defined as follows:

More specifically, for input x1, x2 ...,, And W1, W2 ,..., And B. The output is:

Maybe you will say, I am going to, what is this function? The main function of the sigmoid function is actually, in my opinion, is a function that converts a function between-∞ and + ∞ to A (0, 1) interval. And it is monotonous, that is, when Z is largerσ (z)The larger the image, the smaller the reverse image. For example, when Z approaches + ∞,σ (z)It approaches 1; when Z approaches-∞,σ (z)Close to 0. If you are not clear, you can check it out.σ (z).

Now with the sigmoid function, we can make minor changes to W and B, and these small changes will usually lead to changes in output (think about the continuous function ). Delta output is used to represent the output changes, while delta Wj is used to represent the changes in weights. Delta B is used to represent the changes in bias. According to calculus, there is a formula.

? Output /? WJ and? Output /? B is a partial derivative. If you are not familiar with the partial derivative, you don't need to worry about it. Remember the following expression. Delta output is caused by Delta WJ and Delta B changes. In fact, some calculus books can be understood here, because calculus is the basis of a lot of things, or it should be easy to learn, we still assume that everyone has a high understanding, so there will be no more questions about mathematics in the future.

Now our output is a real number in the range (0, 1). It is no longer a simple 0 or 1. In the past, we determined whether a digital image is "9". If the output is "0", it means not "9", and "1" means "9". Now we can set a value, for example, if the output value is greater than 0.5, the value is 9. After talking about the important sigmoid neuron, we will begin to design a neural network.

 

Structure of Neural Networks

Next we will introduce neural networks and use digital recognition as an example. (Get ready, come on !)

First look at the figure below:

As we have discussed above, the leftmost is the input layer, which contains the input neurons; the rightmost is the output layer, which contains the output neurons; and the middle is the hidden layer. Of course, we can have many middle hidden layers. See:

The design of the input layer and output layer is relatively direct. For example, if we want to determine whether a handwritten character is "9", we will change the image as the input. Images can generally be converted to a grayscale matrix, it is more convenient to use the grayscale matrix as the input for processing. Less judgment. Of course, for some other judgments, it is better to use the source image sometimes. These things require specific analysis. If the image is a 64 by 64 image, there can be 4,096 = 64 × 64 input neurons, and the output can be one. As mentioned above, if the output is less than 0.5, it cannot be 9 ", if the value is greater than or equal to "0.5", the system determines yes.

In addition to the preceding method, there are many other methods to design the middle layer. Some heuristic methods can be used to weigh the middle layer and the training time.

Until now, the neural networks we have discussed are all the results of the previous layer as the input of the next layer. This type of network is called a feed-forward Network (FeedforwardNeural Networks), such a network does not have a ring. There are also some networks, such as the Regression Neural Network (Recurrent Neural Networks). Such a structure may have loops in it. If you are interested, you can describe some information (data link ).

 

A simple neural network that recognizes handwritten numbers

Now we are doing a good job of recognizing handwritten numbers. This is a well-developed direction and the recognition rate is very high.

As shown in the figure above, we generally divide the recognition of numbers into two parts: the division of numbers and the identification of numbers. There are many effective ways to split the first part. Instead of studying the first question, it is better to focus on the more difficult and interesting second part.

The following figure shows the structure of the neural network we will use.

Generally, our input data is a 28x28 pixel chart, so we use 784 = 28x28 neurons as the input layer. Each input is a gray matrix value, the smaller the color, the lighter the color, and the deeper the vice versa.

The second layer is the hidden layer with 15 neurons. These values can be set to the maximum precision.

The last layer is the output layer with 10 neurons. Each output is 0 or 1. Why is it not 4 (2 ^ 4 = 16, so if it is encoded in binary format, 4 output neurons are OK.) This question can be used as an interesting question. If you are not familiar with it, you can leave a comment in the comment, such as an email address. I can send you an answer. Of course, you 'd better think about it first!

Now let's assume we already have training data. X indicates the input. Each input data is a 28x28 = 784-dimension vector. Each value in the vector represents a gray value. We use y = Y (X) to represent the expected output. Y is a 10-dimensional vector. If an input data x corresponds to 6, then our y = (,) T;

Next, let's look at the following function:

W indicates all weights in the neural network, B indicates all bias, a indicates the output of X corresponding to the neural network, and Y (x) indicates the expected output. Therefore, output a depends on W, B, and X. So what does C (W, B) represent? That is, the sum of the squared difference between the output of the neural network and the expected output of the corresponding data. Why should we use the sum of squares? In fact, the definition here is set by people. If you feel uncomfortable, it is okay to use other measures, but it is better to use the sum of squares here. Think about what will happen now. The larger C (W, B), the larger the gap between the output of the neural network we designed and the actual output. The smaller the value of C, the smaller the gap. "So what is our goal ?" "No error (tooth decay! ", As long as C (W, B) is close to 0, the neural network structure we build is an excellent model.

Now the question is how to reduce C (W, B. This is what gradient descent should do.

Gradient Descent:

People who have been familiar with machine learning will be familiar with gradientdescent. If you think there is no problem with this algorithm, you can skip this section. For the introduction below, refer to this article (leftnoteasy). But please pay attention to it, any formula is only used in the gradient descent. When gradient descent is used later, the parameter names are not necessarily the same, but I will mark them later. (It is difficult to write a content about Gradient separately. I am too lazy to use it for reference, and this article is well written)

Application of gradient descent in neural networks:

In fact, the purpose of using gradientdescent is to find W and B. Based on the gradient descent method, we can draw the following recursive rules.

HereETAIt is the step in gradientdescent above. According to this rule, we constantly update W and B, and eventually tend to minimize C. Attaching the following figure is helpful for space imagination.

In this way, the gradient descent is added to the neural network structure. Based on the above reasoning rules, we can continuously make the W and B values in the structure more accurate, and ultimately achieve a good effect.

Stochastic Gradient Descent:

Although we can do this now, you will find that there is a big problem that cannot be solved. If there are a lot of training data, we need to perform a gradient descent for each training data. Obviously, it will consume a lot of time and the learning speed will be very slow.

To overcome this problem, you can use the stochasticgradient descent method. Randomly select a part of the training data, such as m, and Mark x1, x2 ,... XM, which is used to represent big data.

?CXIndicates C (W, B) of a training data X. As shown above, M is used to represent n.

At the same time, we can further improve the update Methods of W and B:

Of course, whether M is required or you have the final say. You can determine the optimal step size based on the actual training situation. Generally, the best step size is a basis obtained through a large number of tests.

Okay, that's the general introduction!

 

Implementation:

After reading so much, if you want to practice it, OK, and encouragement, you can download the data in mnist, where the data is collected specially for training.

Of course, for the above mentioned, Michael Nielsen also has an implementation, on his GitHub (Link), Python implementation, because the program needs to install numpy, scikit-learn, scipy, three Python libraries. If you are using Windows 64-bit, I did not find the official scipy windows 64-bit on the Internet, but I found.

First knowledge of Neural Networks

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.