[Go] Genetic algorithm introduction to mastering

Source: Internet
Author: User

There are many interesting applications of genetic algorithms, such as pathfinding, 8 digital problems, prisoner dilemmas, motion control, and the center of the problem (this is a suggestion from a foreign netizen: in an irregular polygon, look for a center of the largest circle that is contained within the polygon. ), the TSP problem (in a later chapter will be described in detail.) ), production scheduling problems, artificial life simulation and so on. Until we finally see a very interesting metaphor, and think of the kangaroo jumping problem (which is called it for the moment), it is both interesting and intuitive and direct to the nature of the genetic algorithm, which is really a good example of getting started as a beginner.

Problem making and solution

Let's consider the solution to the following question first.

A unary function is known:

It is now required to find the maximum value of the function within the given interval.

Maximal value, maximal value, local optimal solution and global optimal solution

Before solving the above questions, it is necessary to clarify several concepts that will often come up in the future: Maxima, maximums, local optimal solutions, global optimal solutions. People who have studied high school math know that the maximum value is incremented in a small neighborhood and the function value on the right is decremented, and the expression in Figure 2.1 is a "mountain". Of course, there are many "peaks" on the graph, so this function has a lot of great values. And for a function, the maximum value is the largest of all the maxima. So the Maxima have locality, and the maximum value is global.

Because each chromosome in the genetic algorithm corresponds to a solution to the genetic algorithm, we generally use adaptive functions (fitness function) to measure the merits and demerits of this solution. So a mapping is formed from the fitness of a genome to its solution. Therefore, the process of genetic algorithm can be regarded as a process of finding the optimal solution in the multivariate function. In this multidimensional surface there are also countless "peaks", and these optimal solutions are corresponding to the local optimal solution. And there will be a "peak" of the highest elevation, then this is the global optimal solution. The task of genetic algorithms is to climb as far as possible to the highest peak, rather than fall on some small peaks. (In addition, it is worth noting that the genetic algorithm does not have to find "the highest mountain", if the problem of fitness evaluation of the smaller the better, then the global optimal solution is the minimum value of the function, corresponding to the genetic algorithm is to find the "deepest trough") if you still do not understand, then you look down first. The example program in this chapter will show the scene very vividly.

"Kangaroo jumping" problem

Since we understand the function curve as a mountain of mountains and valleys. So we can assume that every solution we get is a kangaroo, and we want them to jump up and down to the highest peaks (though the kangaroo itself is not necessarily willing to do that). So the process of finding the maximum value translates into a "kangaroo jump" process. Here are some ways to introduce "kangaroo jumping".

Mountain climbing method, simulated annealing and genetic algorithm

Solve several common algorithms for finding maximum value problems:

1. Climbing method (Steepest climbing method):

The adjacent points are randomly generated from the search space, and the corresponding solutions are selected to replace the original individuals and repeat the process. Because only the "neighboring" point is compared, so the eyes are relatively "short-sighted", often can only converge to leave the initial position of the relatively close to the local optimal solution above. For problems where there are many local optimal points, the chances of finding a global optimal solution through a simple iteration are very slim. (In the mountain climbing method, the kangaroo is most likely to reach the peak closest to its starting point, but there is no guarantee that the summit is Mount Everest, or a very high mountain.) Because it is only uphill, no downhill. )

2. Simulated annealing:

This method is inspired by the process of metal thermal processing. In the process of metal heat processing, when the temperature of the metal exceeds its melting point (Melting points), the atoms are violently randomly moving. Like all other physical systems, this movement of atoms tends to look for the smallest state of its energy. In the course of this change in energy, at the beginning. The temperature is very high, so that the atom has a high energy. As the temperature decreases and the metal cools, the atoms in the metal become less energetic and eventually reach all possible lows. When simulated annealing is used, the algorithm starts from a large jump, so that it has enough "energy" to escape the local optimal solution which may "pass by" without limiting it, and when it stops near the global optimal solution, it gradually decreases the jumping amount, so that it can "settle" to the global optimal solution. (in simulated annealing, the kangaroo was drunk and jumped randomly for a long time.) If you're lucky, it jumps across the valley from one mountain to the other, to a higher peak. But in the end, it gradually woke up and jumped toward the summit where it was. )

3. Genetic algorithm:

Simulates the evolutionary process of natural selection, performs multi-directional searches by maintaining a group of potential solutions, and supports information composition and exchange in these directions. The search based on polygon is more able to find the global optimal solution than the search by point unit. (in genetic algorithms, there are many kangaroos, which descend to any part of the Himalayan vein.) These kangaroos do not know that their mission is to find Mt. Everest. But every few years, they shoot some kangaroos at low altitudes and hope that the surviving Kangaroos are prolific and have children in their place . later, a game called the day to the health of the net to me a more biopsied story: Once upon a large group of kangaroos, they were inexplicably scattered abandoned in Himalaya. So I had to live there hard. The low altitude is filled with a colorless and tasteless poison gas, the higher the altitude the more thin the poison gas. But the poor kangaroos were completely unaware of it, or were accustomed to being alive and kicking. As a result, kangaroos have died at lower elevations, and the more kangaroos who live longer at higher altitudes, the more likely they are to have children. So after many years, these kangaroos are not consciously gathered to a mountain, but in all the Kangaroos, only the kangaroo gathered to Mount Everest was brought back to the beautiful Australia. )

The following is a description of the process of genetic algorithm implementation.

The realization process of genetic algorithm

The process of implementing genetic algorithms is actually like the evolutionary process of nature. First, we look for a scheme that "digitally" encodes the potential solution of a problem. (establishing a mapping relationship between phenotype and genotype.) Then initialize a population with random numbers (then the first kangaroos are randomly dispersed over the mountains.) ), the individual in the population is the digital encoding. Next, after the appropriate decoding process, (get the position coordinates of the kangaroo. The adaptive function is used to evaluate each individual gene in a single degree. (The higher the kangaroo climbs, the more it is loved by us, so the higher the degree of fitness.) Select the selection function according to a certain rule of merit. (We need to shoot some low-altitude kangaroos at a certain time, at intervals, to keep the overall number of Kangaroos flat.) ) to cross-mutate individual genes. (Let the kangaroo jump randomly) and then produce the offspring. (a kangaroo that wants to survive is prolific and has children there.) Genetic algorithms do not guarantee that you can get the best solution for the problem, but the biggest advantage of using genetic algorithms is that you don't have to understand and worry about how to "find" the optimal solution. (You don't have to instruct the kangaroo to jump over there and jump too far.) And as long as the simple "negation" of some poor performance of the individual on the line. (Shoot the kangaroo that always loves to go downhill.) Later you will slowly understand this sentence, this is the essence of genetic algorithm!

So we summarize the general steps of the genetic algorithm:

Start the loop until you find a satisfactory solution.

1. Assess the degree of fitness of each individual chromosome.

2. In accordance with the higher degree of adaptability, the choice of probability of the principle of the choice of two individuals from the population as the parent and mother side.

3. Extract the chromosomes from both parents and cross them to produce the offspring.

4. Mutation of the chromosomes of the offspring.

5. Repeat the 2,3,4 step until the new population is created.

Ends the loop.

Next, we will examine in detail every detail of the genetic algorithm process.

Coding method of chromosome----gene in kangaroo

Through the previous chapter of the study, the reader has been aware of the human chromosome encoding symbol set, by the 4 kinds of bases of two kinds of coordination composition. There are 4 cases, equivalent to 2 bits of information. This is the encoding of the human gene, then we use the genetic algorithm when the code is how to deal with it?

Inspired by the structure of the human genome, we can imagine that there are currently only "0", "1" bases, and we also use a chain to concatenate them together, because each unit can represent 1 bits of information, so a long enough chromosome can give us all the characteristics of an individual. This is the binary coding method, the chromosomes are roughly as follows:

010010011011011110111110

Although the above coding method is simple and intuitive, but obviously, when the individual features are more complex, need a lot of coding to accurately describe, the corresponding decoding process (similar to the biological DNA translation process, is to map the genotype to the phenotype of the process. will be too complicated to improve the computational complexity of genetic algorithms, improve the efficiency of computing, put forward a floating-point number coding. The chromosomes are roughly as follows:

1.2–3.3–2.0–5.4–2.7–4.3

So how do we use these two encodings to encode a kangaroo's chromosome? Because the purpose of coding is to establish a mapping relationship between phenotype and genotype, the phenotype is generally understood as the characteristic of the individual. For example, a human genotype is a note of 46 chromosomes (two meters total length). ), but can be decoded into a eye, ears, mouth, nose and other characteristics of different living people. So if we want to encode the chromosome of "kangaroo", we must first consider what the "individual characteristics" of "kangaroo" is. Perhaps some people will say that kangaroo features a lot, such as sex, length, weight, perhaps it likes to eat what can also be counted as one of the characteristics. But specifically in the case of solving this problem, we should further consider: no matter the kangaroo is the length, Feishou, as long as it will be shot at low altitude, but also does not specify the length of the kangaroo can jump a little farther, the body short kangaroo jump closer. Of course, it is more irrelevant what it likes to eat. We only care about one thing from beginning to end: Where is the kangaroo? Because as long as we know where kangaroos are, we can do two things we have to do:

(1) Find out the altitude of the kangaroo by looking at the map of the Himalayas (the function value is calculated by the independent variable). To determine if we have to shoot it.

(2) know which new position the kangaroo jumps to.

If we are unable to accurately determine what "individual traits" are necessary and which are not necessary, we can often use such a way of thinking: for example, you think the kangaroo's love to eat what is necessary, then you think about, there are two kangaroos, their other individual characteristics exactly the same situation, a love to eat grass, The other one loves to eat fruit. You will soon find that this will not have the slightest effect on their fate, they should have the same probability of being shot! Only because they are in the same place. (It's worth mentioning that if your genetic code design contains information about what Kangaroos love to eat, it doesn't really affect the evolution of kangaroos, and the kangaroo that climbs to Mount Everest is completely random, but its location is very definite.) )

These are the process of thinking that is often experienced in the coding process of genetic algorithm, we must abstract the specific problems into mathematical models, highlight the principal contradiction and discard the secondary contradiction. Only in this way can the problem be solved concisely and effectively. I hope the beginner will carefully ponder.

Since the location of the kangaroo is determined as an individual feature, the position is the horizontal axis in particular. So next, we're going to build a mapping relationship between phenotype and genotype. It means how to use coding to show the horizontal axis of a kangaroo. Since the horizontal axis is a real number, we are going to encode this real number. Looking back at the two coding methods we have described above, the first thing readers think of is that for binary encoding, the coding will be more complex, and for the floating-point encoding method, it will be more concise. Well, as you can imagine, using a floating-point code, you just need a floating-point number. The following describes how to establish a binary-encoded mapping to a real number.

Obviously, a certain length of binary encoding sequence, can only represent a certain degree of precision floating point number. For example, we want to solve the precision to six decimal places, because the interval length is (-1) = 3, in order to ensure the accuracy, at least the interval [ -1,2] is divided into 3x106 equal parts. And because

So the encoded binary string requires at least 22 bits.

Pass the following two steps for a binary string (B0,b1,.... bn) to the corresponding real value in the conversion bit range.

(1) Converts the binary number represented by a binary string to a 10 binary number:

(2) real numbers within the corresponding interval:

For example a binary string <1000101110110101000111> represents a real value of 0.637197.

Binary strings <0000000000000000000000> and <1111111111111111111111> represent two endpoint values-1 and 2-for the interval, respectively.

The solution to this "kangaroo jump" problem is also encoded with floating-point numbers, since almost all of the example programs in the following chapters use only floating-point numbers. Examples of down programs (including the classes that load genes, mutation functions) are coded for floating-point numbers. (For binary coding here is a simple introduction, but this "kangaroo jump" can be solved by binary coding, and more effective.) So the reader can try to solve the problem by using binary coding. )

We define a class as the carrier of the kangaroo gene. (The careful person raises the question: Why do I use floating-point containers to store kangaroo genes?) The kangaroo gene is not just a floating-point number to express it? Well, yes, actually, for this example, we just need to use a floating-point number on the line. We use containers here to make it easier to use the code later to deal with the problem of coding that requires a bunch of floating-point numbers. )

Class genome  {public  :         friend class Genalg;       Friend class Genengine;         Genome (): fitness (0) {}                 genome (vector <double> VEC, double f): Vecgenome (VEC), fitness (f) {}  //class with parameter initialization parameters.  private:       vector <double> vecgenome;  Container for loading of genes             double fitness;//Fitness    

  

Well, so far we've done a good job of studying the kangaroo's chromosomes, so let's follow up on the evolutionary journey of kangaroos.

Natural selection--adaptability scoring and selecting function.

1. Physical Competition-fitness function (fitness functions)

The process of competition in nature is often two aspects: the struggle between organisms and the process of biological and objective environment. But in our case, you can imagine that kangaroos are very friendly to each other, and they don't need to fight each other for the right to survive. Their survival depends more on your judgment. Because you want to measure which kangaroo should kill, which kangaroo should not kill, so you have to set a measure of the standard. And for this question, the standard of measurement is easier to make: the height at which the kangaroo is located. (because you simply want the kangaroo to climb as high as possible.) So we directly use the kangaroo's altitude as their fitness score. That is, the fitness function returns the value of the function directly.

2. Natural Selection-selection function (selection)

In nature, the more adaptable an individual is, the more likely it is to breed offspring. But it cannot be said that the higher the degree of adaptability, the more offspring, can only be from the probability of more. (after all, some of the lower-altitude kangaroos are lucky to have escaped your eyes.) So how do we build this probabilistic relationship? Below we introduce a common choice method-Roulette (Roulette Wheel Selection) selection method. Assuming the population number, an individual whose fitness is, then the probability of its being selected is:

For example, we have 5 chromosomes, and their corresponding fitness scores are: 5,7,10,13,15.

So the cumulative total fitness is:

So the probability of each individual being selected is:

Oh, some people will ask why we call it the roulette choice method ah? In fact, you just have to look at the roulette in figure 2-2 to understand. The roulette wheel is divided according to the degree of fitness of each individual. As you can imagine, when we turn the roulette wheel and the wheel stops, the pointer randomly points to the area represented by an individual, and fortunately, the individual is chosen. (Obviously, the higher the fitness score, the greater the probability of the individual being selected.) )

Then let's look at how to use code to implement roulette.

Genome Genalg:: Getchromoroulette ()    {        //generates a random number between 0 to the sum of total population adaptability scores.        The M_dtotalfitness records the sum of the adaptability scores of the whole population)        double Slice = (random ()) * totalfitness;        This gene will carry the individual chosen by the turntable.        Genome Thechosenone;        Sum of cumulative adaptive scores.        Double Fitnesssofar = 0;        Traverse every chromosome within the population. for        (int i=0; i<popsize; ++i)        {            //cumulative adaptability score.            Fitnesssofar + = vecpop[i].fitness;            If the cumulative score is greater than the random number, select the gene at this time.            if (Fitnesssofar >= Slice)            {                thechosenone = vecpop[i];                break;            }        }        Returns the individual gene return thechosenone that the turntable has chosen        ;      

  

Genetic mutation-gene recombination (crossover) and Gene mutation.

It should be said that these two steps are to make the offspring different from the root cause of the father (note that I did not say that the offspring is superior to the father's reason, only after the natural choice, will appear the offspring superior to the parents tendency. )。 For these two genetic operations, binary encoding and floating-point encoding in the processing of a great difference, wherein the binary encoding of the genetic operation process is similar to the process of nature inside, the following will be described separately.

1. Gene recombination/crossover (Recombination/crossover)

(1) Binary encoding

Recalling the genetic crossover process introduced in the previous chapter: in the process of homologous chromosome association, non-sister staining monomers (from both parents) often cross and exchange part of chromosomes, 2-3. In fact, the binary encoding of the gene exchange process is very similar to this process – randomly swapping several of the encodings in the same location to produce a new individual, as shown in 2-4.

(2) floating-point code

If a gene contains multiple floating-point numbers, it can also be used in a similar way to the gene crossover, the difference is that the basic unit of the crossover is not a binary code, but a floating-point number. And if a single floating-point gene crosses, there are different ways of recombination, such as intermediate recombination:

So as long as the random generation can get between the parent gene encoding value and the parent gene encoding value of the value as a descendant gene encoding.

Given the specifics of the "kangaroo jump" problem--the kangaroo's individual characteristics only show where it is located. It can be imagined that the same location of the kangaroo gene is exactly the same, and the two identical genes are crossed, the equivalent of nothing to do, so we do not intend to use this example in the crossover of this genetic procedure. (Of course, hard to do this procedure is not impossible, you can take two off-site kangaroo together, let them mate, and then produce offspring, and then send them to where they should be.) )

2. Gene mutation (Mutation)

(1) Binary encoding

Also review the genetic mutation process described in the previous chapter: a gene mutation is a gene change at a certain locus in a chromosome. A gene mutation causes a gene to become its allele and usually causes a certain phenotype change. Well, as mentioned above, the process of genetic manipulation of binary coding is very similar to that in biology, where the "0" or "1" of a gene string has a chance of becoming the opposite of "1" or "0". For example, the following string of binary encodings:

101101001011001

After a genetic mutation, it may become the following new code:

001101011011001

(2) floating-point encoding

The process of gene mutation in floating-point encoding is usually to increase or decrease a small random number to the original floating point number. For example, the original floating-point string is as follows:

1.2,3.4, 5.1, 6.0, 4.5

After the mutation, you may get the following floating-point string:

1.3,3.1, 4.9, 6.3, 4.4

Of course, this small random number also has the size of the point, we generally call it "step". (Think of the "kangaroo jumping" problem, the length of the kangaroo jump is this step.) In general, the larger the step, the faster the evolution starts, but it is difficult to converge to the exact point. But the small step can converge to a point more precisely. So many times in order to speed up the evolutionary speed of genetic algorithm, but also to ensure that the later can be more accurate convergence to the optimal solution above, will take the method of dynamic change step. In fact, this process is similar to the simulated annealing process described earlier, the reader can do a brief review.

Here is the notation for the gene mutation function for floating-point encoding:

void Genalg::mutate (vector<double> &chromo)  {           //follows a predetermined mutation probability, mutations the gene for        (int i=0; i< Chromo.size (); ++i)        {            //If a mutation occurs if            (random () < mutationrate)            {                //causes the weight value to increase or decrease by a small random value                chromo[i] + = ( Random () -0.5) * maxperturbation);                Make sure kangaroos don't jump out of nature reserves.                if (Chromo[i] < Leftpoint)                {                    chromo[i] = rightpoint;                }                else if (Chromo[i] > RightPoint)                {                    chromo[i] = leftpoint;                }                The generic code for non-genetic variation of the above code is only used to ensure the viability of the gene coding.            }          }  }  

  

It is worth mentioning that the characteristics of gene mutations in genetic algorithms are very similar to those of the genetic mutations in biology mentioned in the previous chapter, and here we look back:

1. Genetic mutations occur randomly, and the mutation frequency is very low. (However, high probability variants are required in some applications)

2. Most genetic mutations are harmful to the organism itself.

3. Gene mutations are not directed.

Well, so far, gene coding, gene fitness assessment, gene selection, genetic mutation are all implemented, and the rest is the "parts" of these genetic processes are assembled.

To be continued, to undertake the next genetic algorithm introduction to Master (ii)

http://blog.csdn.net/emiyasstar__/article/details/6938715

[Go] Genetic algorithm introduction to mastering

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.