Simulated annealing of stochastic neural networks

Last Update:2015-08-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, Introduction

In machine learning and combinatorial optimization problems, the most common method is gradient descent method. For example, BP Neural network, the more neurons (units) of multilayer perceptron, the larger the corresponding weight matrix, each right can be regarded as one degree of freedom or variable. We know that the higher the freedom, the more variables, the more complex the model, the more powerful the model. But the stronger the model, the more easily the model can be fitted and the noise is too sensitive. On the other hand, when the optimal solution is searched using gradient descent, the error surface of the multivariable is much like the rolling peaks, the more variables, the more peaks and valleys, which leads to a gradient descent method which is very easy to fall into a small part of the local valley, and stop searching. This is the most common local optimal problem in solving multi-dimensional optimization problems with the conventional gradient descent method. The reason is the gradient descent method of the search criteria, in accordance with the gradient of the negative direction of search, blindly pursue the network error or the reduction of energy functions, so that the search only have "downhill" ability, and do not have "mountain climbing" ability. The so-called "mountain climbing" ability, is when the search into the local optimal, but also can have a certain "mountains and hills" ability, can escape from the local optimal, continue to search the global optimal. If an image metaphor is played for a system with multiple local minima. Suppose there is a concave and convex multi-dimensional energy surface on the pallet, if a small ball is placed on the surface, it will roll into the nearest trough (the local minimum point) in the force of gravity. But the trough is not necessarily the lowest trough on the surface (the global smallest point). Therefore, the local minimum problem can only be solved by improving the algorithm. One possible way is that the algorithm has the ability to "climb mountains" just mentioned, and also to ensure that when the search enters the global optimum, it will not run out of the best "valley" because of "mountain climbing". The stochastic neural networks to be explained in this paper: Simulated annealing (simulated annealing) and Boltzmann machines (Boltzmann machine) are capable of "mountain climbing" by certain probability to ensure that the search falls into local optimum. The comparison of this image can be see:

There are two main differences between random neural networks and other neural networks: ① in the learning stage, random networks do not adjust weights based on certain deterministic algorithms, but rather modify them according to certain probability distributions; ② in the Operation stage, the stochastic network is not the state evolution according to some deterministic network equation, Instead, the transition of its state is determined by a probability distribution. The net input of a neuron cannot determine whether its state takes 1 or 0, but it can determine its state by 1 or
Take a 0 probability. This is the basic concept of stochastic neural network algorithm.

This post will mainly introduce simulated annealing. The following blog post will introduce Boltzmann machines, and the typical problem of combinatorial optimization based on simulated annealing Boltzmann machine: The algorithm implementation of the traveling salesman (TSP) problem.

second, simulated annealing

Simulated annealing algorithm is an effective method to solve the problem of energy local minima in stochastic networks, and its basic idea is to simulate metal annealing process. The metal annealing process is roughly, first heating the object to high temperature, so that its atoms in high-speed movement, the object has a higher internal energy, and then, slowly cooling, with the temperature drop, the speed of the atomic movement slows down, the internal energy decreases, and finally, the whole object reaches the lowest possible state. The simulated annealing process is equivalent to shaking the pallet horizontally, and the high temperature means that the amplitude of the shaking is large, and the ball will definitely jump out of any trough and fall into another trough. The height of this trough (network energy) may be lower than the height of the ball's original trough (the network energy drops), but it may be higher than the original (energy rise). The latter situation, from the local and current point of view, the direction of the movement seems to be wrong, but from the overall and development perspective, it is due to the small ball given the "mountain climbing" ability, it is possible to jump out of the local trough and eventually fall into the global trough. Of course, the strength of the shaking tray should be appropriate, and also from strong to weak (gradual decline in temperature), the ball is not because of the "climb" the ability to climb higher.

In the course of random network learning, the shilling network weights are randomly changed, then the variable network energy function is calculated. The modification of the network weights should follow the following guidelines: If the weight changes after the energy is smaller, then accept the change; otherwise, this change should not be completely rejected, but rather by the pre-selected probability distribution of the acceptance of the weight value. The aim is to give the network a certain "mountain climbing" ability . An effective way to realize this idea is to metropolis the simulated annealing algorithm proposed by others.

Set X represents the microscopic state of a material system (a set of state variables, such as the velocity and position of a particle), E (X) indicates the internal energy of the substance in a microscopic state, and for a given temperature T, if the system is in a thermal equilibrium state, the relationship between the probability and temperature of its being in a certain energetic state is followed Boltzmann distribution law. The distribution functions are:

P (E) ∝exp (-E (X)/kt)

K is the Boltzmann constant, and for the sake of discussion, K is directly incorporated into T for discussion. When the temperature is certain, the higher the energy of the material system, the lower the probability of its being in the state, so that the internal energy of the material system tends to evolve toward the direction of the reduction of energies. If a different temperature is given, the curve represented by the above formula changes as shown:

When the object temperature T is higher, P (e) is not sensitive to the size of the energy E, so the probability of the object being in high or low energy is not different; With the decrease of the temperature T, the probability of the matter being in the energetic state decreases and the probability of being in the low energy is increased; When the temperature approaches 0 o'clock, the probability of the Thus, the higher the temperature parameter T, the more easily the state changes. In order for the material system to eventually converge to the equilibrium state at low temperature, it should set a higher temperature at the beginning of annealing, then gradually cool down, and finally the material system will converge to the lowest energy state at a fairly high probability.

Using the stochastic neural network to solve the optimization problem, the above annealing process is simulated by mathematical algorithm. The simulation method is to define a network temperature to mimic the annealing temperature of the material, and take the network energy as the objective function to optimize. When the network is running at the beginning, the temperature is higher, and the weight value is adjusted to allow the target function to change in the direction of increasing, so that the network can jump out of the local minima of those energies. As the network temperature drops to 0, the probability of 1 is stabilized at the global minimum point of its energy function, and the optimal solution is obtained.

******************************

2015-8-8

******************************

Simulated annealing of stochastic neural networks

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Simulated annealing of stochastic neural networks

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support