Blogger Preface: This article from a network of information, the original author is unknown, I have seen the best of a genetic algorithm tutorial, assuming you can read him patiently, I believe you will be able to master the basic genetic algorithm.
There are many interesting applications for genetic algorithms, such as pathfinding, 8 digital problems, prisoner dilemmas, motion control, and the center of the problem (this is a suggestion from a foreign netizen: in an irregular polygon, look for a center of the largest circle that is included within the polygon. ), the TSP problem (in a later chapter will be introduced in detail.) ), production scheduling problems, artificial life simulation and so on. Until we finally see a very interesting metaphor, and think of the kangaroo jumping problem (which is called it), both interesting and intuitive and direct to the nature of the genetic algorithm, it is really a good example of getting started as a beginner.
the problem and the way to solve it
Let's consider the solution to the following question first.
A unary function is known:
It is now required to find the maximum value of a function within a given interval .
Maximal value, maximal value, local optimal solution and global optimal solution
Before solving the above questions, it is necessary to clarify several concepts that will often come up in the future: Maximal value, maximum value, local optimal solution and global optimal solution. People who have studied high school math know that the maximum value is incremented in a small neighborhood and the function value on the right is decremented, and the expression in Figure 2.1 is a "mountain". Of course, there are very many "peaks" on the graph, so this function has a very large number of maximum values. And for a function, the maximum value is the largest of all the maxima. So the Maxima have locality, and the maximum value is global.
Because each chromosome in genetic algorithm, corresponding to a solution of genetic algorithm, generally we use the adaptive function (fitness functions) to measure the merits of this solution. So a mapping is formed from the fitness of a genome to its solution. So we can think of the process of genetic algorithm as a process of finding the optimal solution in the multivariate function. In this multidimensional surface there are also countless "peaks", and these optimal solutions are corresponding to the local optimal solution. And there will be a "peak" of the highest elevation, then this is the global optimal solution. The task of genetic algorithms is to climb as far as possible to the highest peak, rather than fall on some small peaks. (In addition, it is worth noting that the genetic algorithm does not have to find "the highest mountain", assuming that the problem of adaptive evaluation of the smaller the better, then the global optimal solution is the minimum value of the function, the corresponding, the genetic algorithm is to find the "deepest trough") assuming so far you do not understand, then you look down. The Demo sample program in this chapter will show the scene very vividly.
"Kangaroo jumping" problem
Since we understand the function curve as a mountain of mountains and valleys. So we can assume that every solution we get is just a kangaroo, and we want them to jump up and down to the highest peak (though the kangaroo itself is not necessarily willing to do that). So the process of finding the maximum value translates into a "kangaroo jump" process. Here are some ways to introduce "kangaroo jumping".
Mountain climbing method, simulated annealing and genetic algorithm
Solve several common algorithms for finding maximum value problems:
1. Climbing method (Steepest climbing method):
The adjacent points are randomly generated from the search space, and the corresponding solutions are selected to replace the original individuals and repeat the process. Due to the comparison of the points of the "neighboring", the eyes are relatively "short-sighted", and often can only converge to the local optimal solution which is closer to the initial position. For problems with many local strengths, the chances of finding a global optimal solution through a simple iteration are slim. (In the mountain climbing method, the kangaroo is most likely to reach the peak closest to its starting point, but there is no guarantee that the summit is Mount Everest, or a very high mountain.) As it was just uphill, there was no downhill. )
2. Simulated annealing:
This approach comes from the process of metal thermal processing. In the process of metal heat processing, when the temperature of the metal exceeds its melting point (Melting points), the atoms are violently randomly moving. Like all other physical systems, the atom's movement tends to look for the smallest state of its energy. In the course of this change in energy, it starts. The temperature is very high, so that the atom has very high energy. As the temperature decreases and the metal cools, the atoms in the metal become less energetic and eventually reach all possible lows. Using simulated annealing, let the algorithm start from a large jump, so that it has enough "energy" to escape the possible "passing" of the local optimal solution without limiting it, when it stops near the global optimal solution, gradually reduce the jump amount, so that it "settled" to the global optimal solution. (in simulated annealing, the kangaroo was drunk and jumped randomly for a very long time.) If you're lucky, it jumps across the valley from one mountain to the other, to a higher peak. But in the end, it gradually woke up and jumped toward the summit where it was. )
3. Genetic algorithm:
Simulates the evolutionary process of natural selection, runs multi-directional searches by maintaining a group of potential solutions, and supports information composition and exchange in these directions. The search based on polygon is more able to find the global optimal solution than the search by point unit. (in genetic algorithms, there are very many kangaroos, and they descend to the Himalayan vein in Random places.) These kangaroos do not know that their mission is to find Mt. Everest. But every few years, they shoot some kangaroos at low altitudes and hope that the surviving Kangaroos are prolific and have children in their place . later, a game called the day to the health of the net to me a more biopsied story: Once upon a large group of kangaroos, they were inexplicably scattered abandoned in Himalaya. So just good to have a hard life there. The low altitude is filled with a colorless and tasteless poison gas, the higher the altitude the more thin the poison gas. But the poor kangaroos were completely unaware of it, or were accustomed to being alive and kicking. As a result, kangaroos have died at lower elevations, and the more kangaroos who live longer at higher altitudes, the more likely they are to have children. After so many years, these kangaroos have not consciously gathered to a mountain, but in all the Kangaroos, only the kangaroo gathered to the Mount Everest was brought back to the beautiful Australia. )
The following is an introduction to the process of genetic algorithm implementation.
The realization process of genetic algorithm
the process of implementing genetic algorithms is actually like the evolutionary process of nature. First, we look for a scheme that "digitally" encodes the potential solution of a problem. (establishing a mapping relationship between phenotype and genotype.) Then initialize a population with random numbers (then the first kangaroos are randomly scattered over the mountains.) ), the individual in the population is the digital encoding. Next, after the appropriate decoding process, (get the position coordinates of the kangaroo. The adaptive function was used to evaluate the fitness of each individual gene. (The higher the kangaroo climbs, the more we love it, so the adaptability corresponds to the higher.) Select the selection function in accordance with a certain rule of merit. (We need to shoot some low-altitude kangaroos at a certain time, every once in a while, to keep the overall number of Kangaroos flat.) ) to cross-mutate individual genes. (Let the kangaroo jump randomly) and then produce the offspring. (a kangaroo that wants to survive is prolific and has children there.) The genetic algorithm does not guarantee that you can get the optimal solution for the problem, but the greatest advantage of using a genetic algorithm is that you don't have to understand and worry about how to "find" the optimal solution. (You don't have to instruct the kangaroo to jump over there and jump too far.) Instead, simply "negate" some of the less well-behaved individuals. (Shoot the kangaroo that always loves to go downhill.) Later you will slowly understand this sentence, this is the essence of genetic algorithm!
So we summarize the general steps of the genetic algorithm:
Start the cycle until you find a pleasant solution.
1. Assess the fitness of each chromosome corresponding to the individual.
2. In accordance with the higher degree of adaptability, the choice of probability of the principle of the choice of two individuals from the population as the parent and mother side.
3. Extract the chromosomes from both parents and cross them to produce the offspring.
4. Mutation of the chromosomes of the offspring.
5. Repeat the 2,3,4 step until the new population is produced.
Ends the loop.
Next, we will specifically dissect every detail of the genetic algorithm process.
Coding method of chromosome----gene in kangaroo
Through the previous chapter of the study, the reader has been aware of the human chromosome encoding symbol set, by the 4 kinds of bases of two kinds of coordination composition. There are 4 cases in common, equivalent to 2 bits of information. This is the encoding of human genes, then we use the genetic algorithm when the code is how to deal with it?
Inspired by the structure of human chromosomes, we can imagine that if there are only "0", "1" two bases, we also use a chain to link them together in an orderly way, because each unit can show a 1 bit of information, so a long enough chromosome can give us a full picture of an individual characteristics. This is the binary coding method, where chromosomes are roughly the following:
010010011011011110111110
Although the above coding method is simple and intuitive, but obviously, when the individual features are more complex, it is necessary to have a large number of coding skills to accurately describe the narrative, the corresponding decoding process (similar to the biological DNA translation process, is to map the genotype to the phenotype of the process. will be too complicated to improve the computational complexity of genetic algorithms, improve the efficiency of computing, put forward a floating-point number coding. The chromosomes are roughly for example the following:
1.2–3.3–2.0–5.4–2.7–4.3
So how do we use these two coding methods to encode the chromosomes of kangaroos? Since the purpose of coding is to establish a mapping relationship between phenotype and genotype, the phenotype is generally understood as the characteristic of the individual. For example, the human genotype is described in 46 chromosomes (the total length of two meters of note?). ), but can be decoded into a eye, ears, mouth, nose and other characteristics of the same living people. So if we want to encode the chromosome of "kangaroo", we must first consider what the "individual characteristics" of "kangaroo" is. Perhaps some people will say that kangaroo is characterized by many, such as sex, length, weight, perhaps it likes to eat what can also count as one of the characteristics. But in detail in the case of problem solving, we should further consider: No matter this kangaroo is the length, Feishou, just want it in the low altitude will be shot kill, at the same time there is no regulation of the length of the kangaroo can jump farther, short kangaroo jump closer. Of course, it is more irrelevant what it likes to eat. We only care about one thing from beginning to end: where is the kangaroo. Just because we know where kangaroos are, we can do two things we have to do:
(1) Find out the altitude of the kangaroo by looking at the map of the Himalayas (the function value is calculated by the independent variable). To infer that we do not have to shoot it.
(2) know which new position the kangaroo jumps to.
If we cannot accurately infer what "individual traits" are necessary and what is not necessary, we can often use a way of thinking: for example, you think that the kangaroo's love to eat what is necessary, then you think about, there are two kangaroos, and their other individual characteristics are completely equal to the situation , one only likes to eat grass, the other one just loves to eat fruit. You will immediately find that this will not have the slightest effect on their fate, they should have the same probability of being shot! Just because they're in the same place. (It's worth mentioning that if your genetic code design includes information about what Kangaroos love to eat, it doesn't actually affect the evolution of kangaroos, and the kangaroo that climbs to Mount Everest is completely random, but its location is quite certain.) )
These are the process of thinking that is often experienced in the coding process of genetic algorithm, we must abstract the detailed problem into mathematical model, highlight the main contradiction and discard the secondary contradiction. Only such talents are concise and effective in solving this problem. I hope that the people who have just started to study carefully.
Since the location of the kangaroo is determined as an individual feature, the position is the horizontal axis in detail. So next, we're going to build a mapping relationship between phenotype and genotype. It means how to use coding to show the horizontal axis of a kangaroo. Because the horizontal axis is a real number, we are going to encode this real number. Recalling the two coding methods we described above, the first thing readers think about is that for binary encoding, the coding will be more complex, and for the floating-point encoding method, it will be more concise. Well, as you can imagine, using a floating-point code, you just need a floating-point number. The following describes how to establish a binary encoding to a real number mapping.
Obviously, a certain length of binary encoding sequence, only can represent a certain degree of precision floating point number. For example, we want to solve the precision to six decimal places, because the interval length is (-1) = 3, in order to ensure the accuracy, at least the interval [ -1,2] is divided into 3x106 equal parts. And because
So the encoded binary string requires at least 22 bits.
A binary string (B0,b1,.... bn) conversion interval inside the corresponding real value through the following two steps.
(1) Converts the binary number represented by a binary string to a 10 binary number:
(2) real numbers within the corresponding interval:
For example, a binary string <1000101110110101000111> represents a real value of 0.637197.
Binary strings <0000000000000000000000> and <1111111111111111111111> represent two endpoint values-1 and 2-for the interval, respectively.
This "kangaroo jump" problem is also solved by floating-point numbers, because the sample program for the next chapters almost always uses only floating-point encoding. The Lower program Demo sample (the class containing the loaded gene, the mutation function) is encoded for the floating-point number. (for binary encodings just a simple introduction here, but this "kangaroo jump" can be completely solved by binary coding, and more effective.) So the reader can try to solve the problem by using binary coding. )
We define a class as the carrier of the kangaroo gene. (The careful person raises the question: Why do I use floating-point containers to store kangaroo genes?) Is the kangaroo gene not just a floating-point number? Well, yes, actually, for this example, we just need to use a floating-point number. We use containers here to make it easier to use the code later to handle the problem of coding that requires a bunch of floating-point numbers. )
Class Genome{public:friend class Genalg; friend class Genengine; Genome (): fitness (0) {} genome (vector <double> VEC, double f): Vecgenome (VEC), fitness (f) {} //class with the number of parameters initialized. Private:vector <double> vecgenome; Container of the loaded gene double fitness; Degree of fitness};
Well, so far we've done a good job of studying the kangaroo's chromosomes, so let's follow up on the evolutionary journey of kangaroos.
Natural selection--adaptability scoring and selecting function.
1. Physical Competition-fitness function (fitness functions)
There are two aspects in the process of biological competition in nature: the struggle between organisms and the process of biological and objective environment. But in our case, you can imagine that kangaroos are friendly to each other, and they don't need to fight each other for the right to survive. A lot of their survival depends on your inference. Because you have to measure which kangaroo should be killed, which only kangaroo should not kill, so you have to set a measure of the standard. And for this problem, the measure of the standard is easier to make: The kangaroo is located at the altitude. (because you simply want the kangaroo to climb as high as possible.) So we directly use the kangaroo's altitude as their fitness score. That is, the fitness function returns the function value directly.
2. Natural Selection-selection function (selection)
In nature, the more adaptable individuals are, the more likely they are to reproduce. However, it is not said that the higher the adaptability, the more offspring, can only be from the probability of a lot of other. (after all, some of the lower-altitude kangaroos are lucky to have escaped your eyes.) So how do we build such a probabilistic relationship? Here we present a selection method that is often used-roulette (Roulette Wheel Selection) selection method. If the population number, an individual whose fitness is, then the probability of its being selected is:
For example, we have 5 chromosomes, and their corresponding fitness scores are: 5,7,10,13,15.
So the cumulative total fitness is:
So the probability of each individual being selected is:
Oh, some people will ask why we call it the roulette choice method ah? In fact you just have to look at the roulette in figure 2-2 to make it clear. The roulette wheel is divided according to the degree of fitness of each individual. You can imagine that when we turn the roulette wheel, when the wheel stops, the pointer randomly points to the area represented by an individual, and fortunately, the individual is chosen. (It is very obvious that the higher the fitness score, the greater the probability of the individual being selected.) )
Then let's see how to implement roulette with code.
Genome Genalg:: Getchromoroulette () {//generates a random number between 0 and the total population Adaptive score sum.//M_dtotalfitness records the sum of the entire population's adaptive score) Double Slice = ( Random ()) * totalfitness;//This gene will carry the individual selected by the turntable. The sum of genome thechosenone;//cumulative adaptive fractions. Double Fitnesssofar = 0;//iterates through each chromosome in the population. for (int i=0; i<popsize; ++i) {//Cumulative adaptive score. Fitnesssofar + = vecpop[i].fitness;//If the cumulative score is greater than the random number, select the gene at this time. if (Fitnesssofar >= Slice) {thechosenone = Vecpop[i];break ;}} Returns the individual gene return thechosenone that the turntable has chosen;
Genetic mutation-gene recombination (crossover) and Gene mutation.
It should be said that these two steps are to make the offspring different from the root cause of the father (note that I did not say that the offspring is superior to the father's reason, only after the natural choice, will appear the Offspring superior to the father's tendency. )。 For these two genetic operations, binary encoding and floating-point encoding in the processing of a very large difference, the binary encoding of the genetic operation process is similar to the process of nature inside, the following will be described separately.
1. Gene recombination/crossover (Recombination/crossover)
(1) Binary encoding
Recalling the genetic crossover process introduced in the previous chapter: in the process of homologous chromosome association, non-sister staining monomers (from both parents) often cross and exchange part of chromosomes, 2-3. In fact, the binary encoding of the gene exchange process is very similar to this process-random to the number of the same location in the exchange of the code to produce a new individual, 2-4 see.
(2) floating-point code
Assuming that a gene contains multiple floating-point numbers, it is also possible to use a similar approach to Gene crossover, unlike the basic unit of crossover, which is not a binary code, but a floating-point number. Assuming that the gene crosses a single floating-point number, there are different ways of recombination, such as intermediate recombination:
In this way, the value between the encoded value of the parent gene and the encoded values of the surrogate gene can be obtained by randomly generating the values that are encoded as the offspring genes.
Given the details of the "Kangaroo jump" problem--the kangaroo's individual characteristics only show where it is located. Can imagine that the same position of the kangaroo gene is exactly the same, and two of the same gene after the crossover, the equivalent of nothing to do, so we do not intend to use this example of cross this genetic operation step. (Of course, hard to do this procedure is not impossible, you can take two off-site kangaroo together, let them mate, and then produce offspring, and then send them to where they should be.) )
2. Gene mutation (Mutation)
(1) Binary encoding
The same recall of the genetic mutation process described in the previous chapter: a gene mutation is a gene change at one point in a chromosome. A gene mutation causes a gene to become its allele and generally causes a certain phenotype change. Well, as mentioned above, the process of genetic manipulation of binary coding is similar to that in biology, where the "0" or "1" of a gene string has a chance of becoming the opposite of "1" or "0". For example, the following string of binary encodings:
101101001011001
After a genetic mutation, it may become the following new code:
001101011011001
(2) floating-point encoding
The process of gene mutation in floating-point encoding is usually added to the original floating-point number, or a small random count is reduced. For example, the original floating-point string such as the following:
1.2,3.4, 5.1, 6.0, 4.5
After the mutation, you may get a string of floating-point numbers such as the following:
1.3,3.1, 4.9, 6.3, 4.4
Of course, this small random number also has the size of the point, we generally call it "step". (Think of the "kangaroo jumping" problem, the length of the kangaroo jump is this step.) In general, the larger the step, the faster it will evolve at the beginning, but then it is harder to converge to the exact point. But the small step can converge to a point more precisely. So very often in order to speed up the evolutionary speed of genetic algorithm, but also to ensure that the later can be more accurate convergence to the optimal solution, will take the dynamic change in the step size method. In fact, the process is similar to the simulated annealing process described earlier, and the reader can do a simple recall.
The following is the notation for the gene mutation function for floating-point encoding:
void Genalg::mutate (vector<double> &chromo) {//follows a predetermined mutation probability, mutations the gene for (int i=0; i<chromo.size (); ++i) {// Suppose a mutation occurs if (random () < mutationrate) {//Make the weight value added? or lower a very small random value chromo[i] + = ((random () -0.5) * maxperturbation);// Make sure kangaroos don't jump out of nature reserves. if (Chromo[i] < Leftpoint) {chromo[i] = rightpoint;} else if (Chromo[i] > RightPoint) {chromo[i] = leftpoint;} The generic code for non-genetic variation of the above code is only used to guarantee the viability of the gene coding. }}}
It is worth mentioning that the characteristics of the genetic mutation in genetics are similar to the genetic mutations in biology mentioned in the previous chapter, and recall here:
1. Genetic mutations occur randomly, and the mutation frequency is very low. (Only a high probability mutation is required in some applications)
2. Most genetic mutations are harmful to the organism itself.
3. Gene mutations are not directed.
Well, so far, gene coding, gene fitness assessment, gene selection, genetic mutation are all implemented, and the rest is the "parts" of these genetic processes are assembled.
To be continued, to undertake the next genetic algorithm introduction to Master (ii)
http://blog.csdn.net/emiyasstar__/article/details/6938715
Genetic algorithm Introduction to mastering (i)