Boltzmann machine of random neural network

Source: Internet
Author: User

First, IntroductionIn machine learning and combinatorial optimization problems, the most common method is gradient descent method. For example, BP Neural network, the more neurons (units) of multilayer perceptron, the larger the corresponding weight matrix, each right can be regarded as one degree of freedom or variable. We know that the higher the freedom, the more variables, the more complex the model, the more powerful the model. But the stronger the model, the more easily the model can be fitted and the noise is too sensitive. On the other hand, when the optimal solution is searched using gradient descent, the error surface of the multivariable is much like the rolling peaks, the more variables, the more peaks and valleys, which leads to a gradient descent method which is very easy to fall into a small part of the local valley, and stop searching. This is the most common local optimal problem in solving multi-dimensional optimization problems with the conventional gradient descent method. The reason is the gradient descent method of the search criteria, in accordance with the gradient of the negative direction of search, blindly pursue the network error or the reduction of energy functions, so that the search only have "downhill" ability, and do not have "mountain climbing" ability. The so-called "mountain climbing" ability, is when the search into the local optimal, but also can have a certain "mountains and hills" ability, can escape from the local optimal, continue to search the global optimal. If an image metaphor is played for a system with multiple local minima. Suppose there is a concave and convex multi-dimensional energy surface on the pallet, if a small ball is placed on the surface, it will roll into the nearest trough (the local minimum point) in the force of gravity. But the trough is not necessarily the lowest trough on the surface (the global smallest point). Therefore, the local minimum problem can only be solved by improving the algorithm. One possible way is that the algorithm has the ability to "climb mountains" just mentioned, and also to ensure that when the search enters the global optimum, it will not run out of the best "valley" because of "mountain climbing". The stochastic neural networks to be explained in this paper: Simulated annealing (simulated annealing) and Boltzmann machines (Boltzmann machine) are capable of "mountain climbing" by certain probability to ensure that the search falls into local optimum. The comparison of this image can be see:
There are two main differences between random neural networks and other neural networks: ① in the learning stage, random networks do not adjust weights based on certain deterministic algorithms, but rather modify them according to certain probability distributions; ② in the Operation stage, the stochastic network is not the state evolution according to some deterministic network equation, Instead, the transition of its state is determined by a probability distribution. The net input of a neuron does not determine whether its state takes 1 or 0, but it can determine the probability of its state taking 1 or taking 0. This is the basic concept of stochastic neural network algorithm.

In the Last post, simulated annealing of stochastic neural networks, the main introduction of simulated annealing, this post then simulated annealing idea, began to introduce Boltzmann machine (Boltzmann machines).

Ii. Boltzmann Machine

G. E. Hinton and others in the 1983 ~ 1986 a random neural network called Boltzmann Machine is proposed. In this network, neurons have only two output states, 0 or 1 of unipolar binary. The value of the state is determined by the law of probability and statistics, because the expression of this probability statistic law and the famous statistic mechanic L. Boltzmann proposed Boltzmann distribution is similar, so this network named Boltzmann Machine.

1. Principle and structure of Boltzmann machine

The structure of BM machine is between DHNN and BP network, and the form is similar to the single-layer feedback network DHNN, and the weighted value is symmetrical, and the Wii = 0; the function of BM machine is similar to the three-layer BP network, with input node, output node and hidden node. In general, the input and output nodes are called visible nodes, the hidden nodes become invisible nodes, the input and output nodes are trained to collect training samples, and the hidden nodes are mainly auxiliary to realize the connection between the input and output, so that the training set can be reproduced in the visible unit. There is no obvious hierarchy between the 3 types of BM machines, and the connection form can be represented by a graph of the graphs shown below:


2. Transfer function of neurons

Set the net input of a single neuron in the BM machine


Unlike DHNN, the net input is not directly obtained by the symbolic transfer function to determine the output state, the actual output state will occur according to a certain probability, the output of a certain state transition probability:


The above expression is the probability that the neuron J output state takes 1. The probability of a state of 0 is 1 minus. As can be seen, the greater the net input, the greater the probability that the neuron state takes 1, the smaller the net input, the greater the probability of the neuron state taking 0. The change of the temperature T can change the shape of the probability curve. As shown in the following examples:


it can be seen from the formula and the graph that when the temperature T is higher, the probability curve changes gently, the probability difference between the same net input and the 0 or 1 is small, but when the temperature is low, the change curve is steep, and the probability difference of the same net input state is 1 or 0; when t = 0 o'clock, the probability function degrades The neuron output state will not be random. 3. Network energy function and the search mechanism of operationThe BM Machine uses the same energy functions as the DHNN network to describe the network state, as shown in:


set the BM machine to work asynchronously, each time the J neuron changes state, according to the energy change formula:
here is a discussion:(1) When the net input is greater than 0 o'clock, the probability of a state of 1 is greater than 0.5. If the original state xj=1, then δxj=0, thereby δe=0, if the principle of the state xj=0, then δxj=1, thereby δe<0, energy decline;(2) when the net input is less than 0 o'clock, the probability of a state of 1 is less than 0.5. If the original state xj=0, then δxj=0, thereby δe=0, if the principle of the state xj=1, then δxj=-1, thereby δe<0, energy decline; The above discussion of various possible situations can be seen, for the BM machine, along with the evolution of the network state, from the probabilistic sense of the network energy is always moving toward the direction of the decrease. This means that although the overall trend of network energy is to evolve in the direction of reduction, it is not ruled out that some neuron states may take a small probability, thus increasing the network energy temporarily. It is because of this possibility, BM machine has to jump from the local minimum trough "mountain climbing" ability, which is the BM machine and DHNN network energy changes in the fundamental difference. Due to the use of the neuron state according to the probability random value of the work mode, the BM machine's grace highlight has the ability to constantly jump out of the lower position of the trough to search for a new low position. This mode of operation is called the search mechanism, that is, the network continuously searches for lower energy minima during operation until the global minimum of energy is reached. From the principle of simulated annealing, it can be seen that the constant decline in temperature can make the network "mountain climbing" ability from strong weakened, which is to ensure that the BM function successfully search for the overall energy of the smallest effective measures. 4. Boltzmann distribution of BP machineset xj=1 when the corresponding network energy is e1,xj=0 Network Energy for the E0, according to the previous analysis, when XJ from 1 to 0 o'clock, there is δxj=-1, so there are the following formula: E0-e1=δe = NETJ; the probability that the corresponding State is 1 or the state is 0 is as follows:

The relationship between the probability of the occurrence of any two states in the network and the corresponding energy is:
This is the famous Boltzmann distribution. As can be seen from the formula: the probability of the BM machine in a certain state depends mainly on the energy in this state, the lower the probability of energy; BM Machine is in a certain state of the probability also depends on the temperature parameter T, the higher the temperature, the probability of the occurrence of different states, the network energy is easier to jump out of the local minimum and the The greater the probability difference between different states, the more the network energy is less easily changed, which can make the network search converge. This is the reason why a simulated annealing method is used to search for the smallest global position. 5. The application of BM machinewhen using the BM machine to optimize the calculation, the objective function can be constructed as the energy function of the network, in order to prevent the target function from falling into the local optimum, the above simulated annealing algorithm is used to search the optimal solution, the temperature setting is very high at the beginning, when the neuron state is nearly equal to 1 or 0, so the network energy can reach any , including local minimum or global minimum. When the temperature decreases, the probabilities of different states change, the probability of low energy is large, and the probability of high energy state appears small. When the temperature gradually drops to 0 o'clock, each neuron can only take 1, or only 0, when the state of the network is frozen near the global minimum of the objective function. The corresponding network state is the optimal solution of the optimization problem. when associating with BM machine, it is possible to simulate the probability of the occurrence of training samples by learning the probability of stable state of network. According to the learning type, BM machine can be divided into self-association and different association. As shown in the following:
the Visible node V in the self-associative BM machine is similar to the node in the DHNN network, even if the input node is also the output node, the number of hidden node h is determined by the learning needs, and the minimum can be 0; the visible node V in the dissimilar associative BM machine is divided into input node group I and Output node group o according to function. *************************************2015-8-10

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Boltzmann machine of random neural network

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.