Machine Learning Theory and Practice (12) Neural Networks

Source: Internet
Author: User

Neural Networks are getting angry again. Because deep learning is getting angry, we must add a traditional neural network introduction, especially the back propagation algorithm. It is very simple, so it is not complicated to say anything about it. The neural network model is shown in Figure 1:

(Figure 1)

(Figure 1) the neural network model in is composed of multiple perceptron layers. The sensor is a single-layer Neural Network (accurate, it should not be called a neural network ), it has only one output node, as shown in Figure 2:

(Figure 2) sensor

A sensor is equivalent to a linear classifier. A neural network with multiple hidden nodes is a combination of multiple sensors. Therefore, it is actually a combination of multiple linear classifiers to form a non-linear classifier, as shown in figure 3:

(Figure 3)

A layer of sensor has a strong fitting capability. When multiple layers of sensor are combined, the fitting capability is even worse. Unfortunately, although the fitting capability is strong, the algorithm for finding accurate fitting parameters is not very good, it is easy to fall into the local minimum, and the BP algorithm is "good at" the local minimum. The so-called local minimum, as shown in figure 4, after the network weight is randomly initialized, the gradient is obtained, then, update the parameter with a gradient. If the vertex of the initialization parameter is selected improperly, when the gradient is 0, it may be a point that minimizes the local cost J, rather than the global minimum, naturally, the network weight is not the best. BP algorithms have always had such problems and are prone to overfitting due to large network scale. Fortunately, a series of trick problems have been improved recently in deep learning. For example, using greedy pre-training to improve initialization parameters is equivalent to finding a good initial point. Strictly speaking, J's "terrain" is actively modified in the positive and negative stages. This is my personal understanding, finally, we use the traditional BP algorithm combined with tags to continue searching for the global minimum. The role of this BP algorithm is also called weight fine-tuning in deep learning. Of course, BP is not the only fine-tuning algorithm, there is only one purpose of fine-tuning: to obtain the gradient of the target function and to update parameters. In addition, in deep learning, sparse and dropout are used to prevent overfitting.

(Figure 4)

After introducing the function of the BP algorithm, let's take a look at the principle of the BP algorithm. It is also very simple to evaluate the weights of each layer of the target function, because we need to update the weights to require a gradient, then, the gradient is used to update the weight. The Network goes through multi-layer function processing from the input to the final output. During the process of derivation, this composite function is chained. In order not to make the BP algorithm complex, I found the simplest network, as shown in Figure 5:

(Figure 5)

(Figure 5) the network has only three layers: input layer X, hidden layer H, and output layer y. There are two weights W1 and W2 in the middle, which are randomly initialized at the beginning. Neural network training is divided into two processes. The first process is from input training samples, layer-by-layer computing to final output y. This process is Forward propagation ). Then calculate the difference between the output Y and the real tag. This can be used as a simple objective function. Our goal is to minimize this objective function in all training sets, to put it bluntly, you need to find the minimum value of the target function, and then ask for a gradient, and then update the parameter. The next step is to calculate the gradient and evaluate the target function for W2 and W1. This process is called back propagation (
Propagation ). The following figure shows the two processes:

(Figure 6) forward and reverse Propagation

Note that the center variables such as A1 and A2 are retained from the Forward propagation. These variables are used in reverse propagation. The entire principle is very simple, that is, the target function is used to evaluate the weights of each layer, because it is a composite function, we need to perform chain-based derivation. With the gradient, use this classic update method to match the new weight, where R
It is a learning rate set by yourself. If it is too large, it will cause learning shaking. The inverted triangle is the gradient. In addition, the output layer does not have to use the objective functions (Figure 6). You can specify different objective functions as needed, even if you add an support vector machine to the final output, as long as you can perform the export, just get the gradient. In fact, one of Hinton's disciples is doing this recently. I use my own wisdom to improve the model ^. ^. In addition, the parameter update process of the convolutional neural network is similar, and it is inevitable to use the BP Algorithm for derivation.

The following code imitates these two processes:

import mathimport randomimport stringrandom.seed(0)# calculate a random number where:  a <= rand < bdef rand(a, b):    return (b-a)*random.random() + a# Make a matrix (we could use NumPy to speed this up)def makeMatrix(I, J, fill=0.0):    m = []    for i in range(I):        m.append([fill]*J)    return m# our sigmoid function, tanh is a little nicer than the standard 1/(1+e^-x)def sigmoid(x):    return math.tanh(x)# derivative of our sigmoid function, in terms of the output (i.e. y)def dsigmoid(y):    return 1.0 - y**2class NN:    def __init__(self, ni, nh, no):        # number of input, hidden, and output nodes        self.ni = ni + 1 # +1 for bias node        self.nh = nh        self.no = no        # activations for nodes        self.ai = [1.0]*self.ni        self.ah = [1.0]*self.nh        self.ao = [1.0]*self.no                # create weights        self.wi = makeMatrix(self.ni, self.nh)        self.wo = makeMatrix(self.nh, self.no)        # set them to random vaules        for i in range(self.ni):            for j in range(self.nh):                self.wi[i][j] = rand(-0.2, 0.2)        for j in range(self.nh):            for k in range(self.no):                self.wo[j][k] = rand(-2.0, 2.0)        # last change in weights for momentum           self.ci = makeMatrix(self.ni, self.nh)        self.co = makeMatrix(self.nh, self.no)    def update(self, inputs):        if len(inputs) != self.ni-1:            raise ValueError('wrong number of inputs')        # input activations        for i in range(self.ni-1):            #self.ai[i] = sigmoid(inputs[i])            self.ai[i] = inputs[i]        # hidden activations        for j in range(self.nh):            sum = 0.0            for i in range(self.ni):                sum = sum + self.ai[i] * self.wi[i][j]            self.ah[j] = sigmoid(sum)        # output activations        for k in range(self.no):            sum = 0.0            for j in range(self.nh):                sum = sum + self.ah[j] * self.wo[j][k]            self.ao[k] = sigmoid(sum)        return self.ao[:]    def backPropagate(self, targets, N, M):        if len(targets) != self.no:            raise ValueError('wrong number of target values')        # calculate error terms for output        output_deltas = [0.0] * self.no        for k in range(self.no):            error = targets[k]-self.ao[k]            output_deltas[k] = dsigmoid(self.ao[k]) * error        # calculate error terms for hidden        hidden_deltas = [0.0] * self.nh        for j in range(self.nh):            error = 0.0            for k in range(self.no):                error = error + output_deltas[k]*self.wo[j][k]            hidden_deltas[j] = dsigmoid(self.ah[j]) * error        # update output weights        for j in range(self.nh):            for k in range(self.no):                change = output_deltas[k]*self.ah[j]                self.wo[j][k] = self.wo[j][k] + N*change + M*self.co[j][k]                self.co[j][k] = change                #print N*change, M*self.co[j][k]        # update input weights        for i in range(self.ni):            for j in range(self.nh):                change = hidden_deltas[j]*self.ai[i]                self.wi[i][j] = self.wi[i][j] + N*change + M*self.ci[i][j]                self.ci[i][j] = change        # calculate error        error = 0.0        for k in range(len(targets)):            error = error + 0.5*(targets[k]-self.ao[k])**2        return error    def test(self, patterns):        for p in patterns:            print(p[0], '->', self.update(p[0]))    def weights(self):        print('Input weights:')        for i in range(self.ni):            print(self.wi[i])        print()        print('Output weights:')        for j in range(self.nh):            print(self.wo[j])    def train(self, patterns, iterations=1000, N=0.5, M=0.1):        # N: learning rate        # M: momentum factor        for i in range(iterations):            error = 0.0            for p in patterns:                inputs = p[0]                targets = p[1]                self.update(inputs)                error = error + self.backPropagate(targets, N, M)            if i % 100 == 0:                print('error %-.5f' % error)def demo():    # Teach network XOR function    pat = [        [[0,0], [0]],        [[0,1], [1]],        [[1,0], [1]],        [[1,1], [0]]    ]    # create a network with two input, two hidden, and one output nodes    n = NN(2, 2, 1)    # train it with some patterns    n.train(pat)    # test it    n.test(pat)if __name__ == '__main__':    demo()

Reprinted please indicate Source: http://blog.csdn.net/marvin521/article/details/9886643

References:

[1] Learning from data. Yaser S. Abu-Mostafa

[2] machine learning. Andrew Ng

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.