Pattern Recognition: Design and Implementation of simulated annealing algorithm, pattern recognition Annealing
This section aims to record the following learning and understanding the Simulated Annealing (SA) process. The simulated annealing algorithm is a general probability algorithm used to find the optimal solution of a proposition in a large search space. Here we use the random simulated annealing algorithm and the deterministic simulated annealing algorithm to program on the MATLAB platform to find a 6-unit fully connected network energy minimization model.
Reference books: Richard O. Duda, Peter E. Hart, David G. Stork, titled Pattern Classification
I. Technical Discussion
1. Random Method
Learning plays a central role in constructing a pattern classifier. A common practice is to assume a single-or multi-parameter model, and then estimate the value of each parameter based on the training sample. When the model is relatively simple and low-dimension, an analytical method can be used, such as finding the function derivative to explicitly solve the equation to obtain the optimal parameter. If the model is relatively complex, you can calculate the local derivative and use the gradient descent algorithm, such as artificial neural networks or some other maximum likelihood methods. For more complex models, there are often many local extreme values, and the effects of the above methods are often unsatisfactory.
If a problem is more complex, or there are fewer anterior knowledge and training samples, the more dependent we are on a complex search algorithm that can automatically search for feasible solutions, such as a parameter-Based Instant Search Method. A common practice is to make the search move toward the expected optimal region, while allowing a certain degree of random disturbance to find a better solution.
2. Random search
Here we mainly study how to search for the optimal solution in multiple boosting solutions. Assume that multiple variables si, I = 1, 2 ,..., N, where the values of each variable take one of two discrete values (such as-1 and 1 ). The optimization problem is described as follows: to determine the proper value of N si, the following energy function (also called the cost function) is the smallest:
Where w_ij is a real symmetric matrix, the value can be positive or negative, so that the feedback right to itself is zero (that is, w_ii = 0 ), this is because the non-zero w_ii only adds a constant unrelated to si to the energy function e, without affecting the nature of the problem. This optimization problem can be expressed by network and node, as shown in. The link between nodes corresponds to each weight value w_ij.
3. Limitations of greedy Algorithms
As mentioned above, for the energy E minimization problem for solving the si with N variables, unless N values are small, it is often impossible to directly solve the problem, because the number of the configuration is as high as N ^ 2. If you use the greedy algorithm to search for the optimal configuration, you must first randomly select the initial status of each node, then, we examine each node in sequence to calculate the energy in si = 1 and si =-1, and finally select the State migration that can reduce the energy. This judgment only applies to adjacent nodes directly connected with non-zero weights.
This greedy search algorithm is unlikely to successfully find the optimal solution, because the system often falls into the local minimum energy, or does not converge at all, so you need to select other search methods.
4. Simulated Annealing
In thermodynamic, The Annealing Process of solids mainly consists of the following three parts:
Metropolis proposed the Importance Sampling Method in 1953, that is, accepting new states with probability. Specifically, when the temperature is T, the current status I generates a new status j, whose energy is Ei and j, respectively. If the value of j is smaller than Ei, the new status j is accepted as the current status; otherwise, calculate the probability p (∆ E ):
If p (distinct E) is greater than the random number in the range [0, 1], the new status j is still accepted as the current status; if not, the I is retained as the current status, k is the Boltzmann constant, and T is the system temperature. The above importance sampling process is generally called the Metropolis criterion:
In 1983, Kirkpatrick realized the similarity between composite optimization and physical annealing. He was enlightened by the Metropolis criterion and proposed a simulated annealing algorithm. The simulated annealing algorithm is a random optimization algorithm based on the Monte Carlo iteration solution strategy. Its starting point is based on the similarity between the physical Annealing Process and the combination optimization, the simulated annealing method starts from a relatively high initial temperature and performs random search in the solution space using the Metropolis sampling strategy with the probability abrupt jump characteristics. With the decreasing temperature, the sampling process is repeated, finally, the global optimization of the problem is obtained. Compared with greedy algorithms, the main advantage of simulated annealing algorithms is that the system may jump out of the local minimum position.
For an optimization problem:
The process of solving the optimization problem is compared with the thermal equilibrium problem of statistical thermodynamic. By simulating the Annealing Process of High-temperature objects, we try to find the global optimal or approximate global optimal solution of the optimization problem;
With the adjustment of parameters, the target function can occasionally develop in the direction of increased energy (corresponding to sometimes increased energy) to facilitate the jump out of a local extremely small area, with the decrease of the imaginary temperature (corresponding to the annealing of the object), the system activity decreases, and the system is eventually stabilized in the region where the global minimum is located.
5. Two simulated annealing Algorithms
Two simulated annealing algorithms, namely random simulated annealing and deterministic simulated analing, are implemented as follows:
The random simulated annealing algorithm converges slowly, partly because of the discrete nature of all the configuration spaces searched for, that is, the configuration space is an n-dimensional supercube. Each search trajectory can only be along one side of the supercube, and the state can only fall on the vertex of the supercube, thus the complete gradient information is lost. The gradient information can be provided using the continuous state value inside the supercube. A faster method is the following deterministic Simulated Annealing Algorithm:
Ii. Discussion of experiment results
Construct a 6-unit fully connected network, and use the formula of the energy function:
The network connection weight matrix is as follows:
The design steps mainly include the following parts:
Compile the program [E, s_out] = RandomSimulatedAnnealing (T_max, time_max, c, s, w) to implement the random simulated annealing algorithm described in algorithm 1 above. Set the following parameters: T_max = 10, T (m + 1) = c * T (m), c = 0.9, curve 2 shows the variation of the number of times the energy decreases with the temperature (because the simulated annealing algorithm has random results, the following steps are performed to observe the algorithm four times ), the final configuration s3 obtained after four times is shown.
Change parameters: initial temperature: T_max = 5, T (m + 1) = c * T (m), c = 0.5, the variation curve of energy with the decrease of temperature is shown in 4, and the final configuration s5 obtained after four times is shown.
Compile the program [E, s_out] = DeterministicAnnealing (T_max, time_max, c, s, w) to implement the deterministic simulated annealing algorithm described in algorithm 2 above. Set the following parameters: T_max = 10, T (m + 1) = c * T (m), c = 0.9, the change curve of the energy with the decrease of temperature is 6, and the final configuration s7 is obtained after four times.
Change parameters: initial temperature: T_max = 5, T (m + 1) = c * T (m), c = 0.5, the change curve of energy with the decrease of temperature is shown in 8, and the final configuration s9 obtained after four times is shown in.
Conclusion: Figures 2 and 3 show the running results of multiple random simulated annealing Algorithms. We can see that the configuration s is not necessarily the same; the waveform of the energy function E converges to the global minimum value-19 after several gradually decreasing vibrations. When T (1) = 5, c = 0.5 is changed, we can see from Figure 4 that the waveform of the energy function E decreases rapidly and reaches a smaller value, A process of temperature gradient and shock adjustment is missing in the middle.
Figure 6 and 7 show the running results of multiple deterministic simulated annealing algorithms, and the final configuration s obtained each time is consistent; the waveform of the energy function E converges to the global minimum value-19 after gradual decline, without the violent fluctuation in the random simulated annealing. When T (1) = 5, c = 0.5 is changed, we can see from Figure 8 that the waveform of the energy function E also shows a rapid decline and reaches a small value, the process of temperature gradient and adjustment is missing.
Based on the above experimental results, we found that both random annealing and deterministic annealing can provide similar final solutions, but for some large-scale practical problems, the random simulated annealing operation speed is very slow, in contrast, the deterministic annealing algorithm is much faster, sometimes 2 ~ faster ~ Three orders of magnitude.
In addition, the choice of the initial temperature T (1) and the temperature drop factor c also has a great impact on the algorithm performance. The quality and efficiency of optimization should be considered in the determination of initial temperature. The common methods include the following:
- A group of statuses are sampled evenly, and the variance of the target values in each status is the initial temperature;
- A group of States are randomly generated to determine the maximum target value difference between the two states | maximum max |. Then, based on the difference value, a certain function is used to determine the initial temperature. For example, T (1) =-random/ln
P_r, where p_r is the initial acceptance probability;
- Use empirical formulas.
The simulated annealing algorithm design includes three important functions: State generation function, state acceptance function, and temperature update function, the internal cycle termination rules and external cycle termination rules must be followed. The design of these steps will determine the optimization performance of the simulated annealing algorithm.
Iii. Experiment results
Iv. Simple code implementation
% % Random simulated annealing function % input parameters: % T_max: initial temperature % time_max: Maximum number of iterations % c: temperature drop rate % s: Initial Configuration % w: Weight Matrix % output parameter: % E: energy Change matrix % s_out: configuration % after algorithm Calculation % function [E, s_out] = RandomSimulatedAnnealing (T_max, time_max, c, s, w) [x, y] = size (s); time = 1; % iterations T (time) = T_max; % initial temperature setting while (time <(time_max + 1) % (T (time)> T_min) for I = 1:1000 num = ceil (rand (1) * y ); % select to generate a random number between 1 and y for j = 1: y e (j) = w (num, j) * s (num) * s (j ); end Ea (time) =-1/2 * sum (e); Eb (time) =-Ea (time); if Eb (time) <Ea (time) s (num) =-s (num); elseif (exp (-(Eb (time)-Ea (time)/T (time)> rand () s (num) =-s (num); else s (num) = s (num); end % calculated energy E (time) = 0; for it = for jt = E (time) + w (it, jt) * s (it) * s (jt ); end E (time) = E (time) * (-0.5); s_out (time, :) = s; % time = time + 1; T (time) = T (time-1) * c; end
% % Deterministic simulated annealing function % input parameters: % T_max: initial temperature % time_max: Maximum number of iterations % c: temperature drop rate % s: Initial Configuration % w: Weight Matrix % intermediate function: % tanh (l/T ): response function, which has an implicit normalization function. % output parameter: % E: energy change matrix % s_out: configuration % after algorithm Calculation % function [E, s_out] = DeterministicAnnealing (T_max, time_max, c, s, w) [x, y] = size (s); time = 1; % iterations T (time) = T_max; % initial temperature setting while (time <(time_max + 1) num = ceil (rand (1) * y ); % select to generate a random number between 1 and y for j = 1: y e (j) = w (num, j) * s (j); end l (time) = sum (e); s (num) = tanh (l (time)/T (time); % calculated energy E (time) = 0; for it = for jt = E (time) + w (it, jt) * s (it) * s (jt ); end E (time) = E (time) * (-0.5); s_out (time, :) = s; % time = time + 1; T (time) = T (time-1) * c; end
% Simulated Annealing lab % % clear; close all; % network connection weight matrix w = [0 5-3 4 4 1 ;... 5 0-1 2-3 1 ;... -3-1 0 2 0 ;... 4 2 2 0 3-3 ;... 4-3 2 3 0 5 ;... 1 1 0-3 5 0]; num = 6; % generate 6 total s_in = rand (1, num ); % generate a random sequence of 1 and-1 s_in (s_in> 0.5) = 1; s_in (s_in <0.5) =-1; disp (['initial configuration S :', num2str (s_in)]); % The following is a random simulated annealing algorithm T_max = 10; % the initial temperature is set to time_max = 100; % the maximum number of iterations c = 0.9; % temperature variation ratio [E1, s_out1] = RandomSimulatedAnnealing (T_max, time_max, c, s_in, w); subplot (221), plot (E1); grid on; title (['t (1) = ', num2str (T_max),', c = ', num2str (c),', random simulated annealing algorithm energy change curve ']); disp (['t (1) = 10, c = 0.9, the final configuration of the random simulated annealing algorithm is: ', num2str (s_out1 (time_max, :))]); t_max = 5; % initial temperature setting time_max = 100; % maximum number of iterations c = 0.5; % temperature change ratio [E2, s_out2] = RandomSimulatedAnnealing (T_max, time_max, c, s_in, w); subplot (222), plot (E2); grid on; title (['t (1) = ', num2str (T_max),', c = ', num2str (c), ', Energy variation curve of the random simulated annealing algorithm']); disp (['t (1) = 5, c = 0.5, the final configuration of the random simulated annealing algorithm is: ', num2str (s_out2 (time_max, :))]); % The following is the deterministic simulated annealing algorithm T_max = 10; % initial temperature setting time_max = 100; % maximum number of iterations c = 0.9; % temperature change ratio [E3, s_out3] = DeterministicAnnealing (T_max, time_max, c, s_in, w ); subplot (223), plot (E3); grid on; title (['t (1) = ', num2str (T_max),', c = ', num2str (c ), ', energy change curve of the deterministic simulated annealing algorithm']); disp (['t (1) = 10, c = 0.9, the final configuration of the deterministic simulated annealing algorithm S is :', num2str (s_out3 (time_max, :))]); T_max = 5; % initial temperature setting time_max = 100; % maximum number of iterations c = 0.5; % temperature change ratio [E4, s_out4] = DeterministicAnnealing (T_max, time_max, c, s_in, w); subplot (224), plot (E4); grid on; title (['t (1) = ', num2str (T_max), ', c =', num2str (c), ', Energy variation curve of the deterministic simulated annealing algorithm']); disp (['t (1) = 5, c = 0.5, the final configuration of the deterministic simulated annealing algorithm is: ', num2str (s_out4 (time_max, :))]);
Reference: http://www.cnblogs.com/growing/archive/2010/12/16/1908255.html