A case study on genetic algorithm--the "theory" of Automatic group volume system based on genetic algorithm

Source: Internet
Author: User

Introduction of genetic algorithm1.1 Overview of genetic algorithms

Genetic algorithm (genetic algorithm, short GA) is a kind of randomized search method derived from the evolutionary law of the organism (survival of the fittest, the fittest genetic mechanism), which was first proposed by Professor J.holland of the United States in 1975. Genetic algorithm is a computational model to simulate Darwin's genetic selection and natural phase-out evolutionary process, and it is commonly used to solve the optimal problem under multi-constraint conditions by simulating the natural evolutionary process to search for the optimal solution.

      Genetic algorithms start with a population that represents a potential solution set of problems, A population is made up of a certain number of individuals who have been genetically encoded. Each individual is actually an entity with a characteristic chromosome. Chromosomes as the main carrier of genetic material, that is, the collection of multiple genes, it determines the shape of the external expression of the individual. As a result, mapping from phenotype to genotype is required at the outset to encode work. Since the work of the gene coding is very complex, often simplified, such as binary coding, after the initial population generation, according to the principle of survival of the fittest and the fittest, the evolution of generations to produce more and better approximate solution, in each generation, according to the size of the individual in the problem domain to select individuals, By means of genetic operator of Natural genetics, the group crosses and Mutates, and produces a population representing a new solution set. This process will lead to a population that is as natural as evolution of the epigenetic population more adapted to the environment than the previous generation, the best individual in the last population is decoded and can be used as the approximate optimal solution for the problem.

Genetic algorithm provides a general framework for solving the problem of complex system optimization. It is not dependent on the specific areas of the problem, it has strong robustness to the types of problems, so it is widely used in many disciplines. The main application areas of genetic algorithm are: function optimization, combinatorial optimization, production scheduling problem, automatic control, robot automatic control, image processing and pattern recognition, artificial life, genetic programming, machine learning and so on.

1.2 Basic operation and procedure of genetic algorithm

(1) Initialize. Set the evolutionary Algebra counter, set the maximum evolutionary algebra, randomly generate n individuals as the initial population.

(2) The degree of computer adaptability. The fitness of each individual in the initial population is calculated.

(3) Select. Selection is used to determine the number of individuals that are reorganized or crossed, and how many individuals will be produced by the selected individual. The selection of the parent is based on the degree of fitness obtained above. The following algorithms can be selected: Roulette selection, random traversal sampling, local selection, truncation selection, tournament selection.

(4) Cross. Gene recombination is the combination of information from the parent's cross-breeding group to produce new individuals. According to the individual coding method, the following algorithms can be used: real value recombination, discrete recombination, intermediate recombination, linear recombination, extended linear recombination. Binary crossover, single-point crossover, multi-point crossover, uniform crossover, shuffle cross, narrow agent crossover.

(5) Variation. The mutation experienced by the offspring after crossing is actually the change of the offspring gene according to the small probability disturbance. Depending on the individual code representation method, the following algorithms can be used: real value mutation, binary mutation.

1.3 Genetic algorithm features

(1) The genetic algorithm begins with the string set of problem solutions, rather than starting with a single solution. This is a great difference between genetic algorithms and traditional optimization algorithms. The traditional optimization algorithm is to solve the optimal solution from a single initial value iteration, and the local optimal solution is easy to be mistaken into. The genetic algorithm searches from the serial set, the coverage is big, is advantageous to the global merit.

(2) Many traditional search algorithms are single-point search algorithms, which are easy to get into local optimal solutions. The genetic algorithm simultaneously deals with multiple individuals in the population, i.e., evaluates multiple solutions in the search space, reduces the risk of falling into the local optimal solution, and the algorithm itself is easy to parallelize.

(3) Genetic algorithm basically does not need to search the space knowledge or other auxiliary information, but only uses the fitness function value to evaluate the individual, on this basis carries on the genetic operation. The fitness function is not only constrained by continuous micro, but also can be arbitrarily set by the definition field. This feature makes the application scope of genetic algorithm greatly expanded.

(4) Genetic algorithm does not adopt the deterministic rule, but uses the change rule of probability to instruct his search direction. (5) Self-organization, self-adaptation and self-study habits. When the genetic algorithm uses the information obtained by the evolutionary process to organize the search itself, the individuals with large adaptability have higher survival probability, and get the gene structure which is more adaptable to the environment.

1.4 A few additions to the genetic algorithm (beginners may have questions)

1, in the process of selection, choose how many times, will not cause the reduction of the population, choose to repeat how to do?

A: There is no limit to the number of choices, that is, the choice is definitely not selected, so it will result in a reduction in the number of population, choose to repeat the individual discard the re-selection. The recommended number of choices is less than the number of populations, because they are not duplicated, so when the number of times is the population then all is chosen, thus losing the meaning of the choice. The repetition is due to the fact that repeated individuals do not help to differentiate the population (imagine that in extreme cases all repeat individuals, then the crossover is all the same and meaningless).

2, that is to calculate the fitness of each individual in the population, why not directly choose the high degree of adaptability, abandon the low degree of adaptability, and to use other algorithms to choose?

A: Individuals with low fitness may also have high-quality genes. Real-life examples: A pair of fools gave birth to a clever son.

3, cross the process is random crossover or 22 cross, how many times appropriate?

A: Random or 22 crosses can be, crossing times greater than or equal to the number of individuals in the initial population/2. Because the crossover produces two new individuals at a time, and the mutation of the 3rd step does not result in a new individual, the number of crosses is greater than or equal to the number of individuals in the initial population/2 in order to ensure that the numbers of individuals in the population are not decreasing (negative population growth).

The application of genetic algorithm in the automatic set of volume

Automatic group volume is based on the given constraints (the total number of questions, overall score, knowledge points distribution, difficulty coefficient, the proportion of the types of questions), search the question bank and the characteristics of the matching parameters, so as to extract the best combination of questions. This shows that the problem of automatic group volume is a combinatorial optimization problem with multiple constraints.

The traditional genetic algorithm has the low efficiency and the late maturity convergence of the late search. According to the requirements of the specific situation and requirements, the genetic algorithm has been improved slightly, which is manifested by real coding, segmented crossover, conditional generation of the initial population, and the selection of crossover to increase the fitness check. The specific solution is as follows.

2.1 Chromosome coding and initial population design

To solve the problem by genetic algorithm, the solution space of the problem is mapped into a set of code strings, that is, the chromosome coding problem. Binary coding is used in traditional genetic algorithms. When using binary encoding, each problem in the question bank should appear in this bits string, 1 indicates that the title is selected, and 0 indicates that the question is not selected. This kind of bits string is long, and in the operation of crossover and mutation genetic operator, the number of questions in various types of problems is not well controlled. Using a real-coded scheme, a test paper mapped to a chromosome, the test paper of each question number as a gene, the value of the gene is directly represented by the question number, each type of question number is put together, according to the problem section, in the subsequent operation of the genetic operator also according to paragraph, to ensure that the total number of questions of each type of question For example, to set up a "C language Program Design" test paper, its single-choice 6-way, multi-choice 4, the Judgment 5, fill in the blanks 5, ask the answer 3, then the chromosome code is:

(1, 6, 13, 12, 10, 4 | 18, 22, 25, 28 | 52, 36, 67, 11, 123 | 31, 35, 32, 47, 44 | 99, 85, 45)

Question and answer questions on single-choice multi-choice questions

The initial population of the test paper is not produced by a completely random method, but is randomly generated according to the requirements of the total problem number, the proportion of the questions, the overall score and so on, so that the initial population satisfies the question number and total score. This accelerates the convergence of the genetic algorithm and reduces the number of iterations. By using the block real number coding, we can overcome the disadvantage of using binary coded search space to be too large and the length of coding is too long, and cancel the decoding time of the individual and improve the solution speed.

2.2 The design of fitness function

The fitness function is used to judge the quality and disadvantage of the individual in the Test paper group, and the genetic algorithm uses the information of fitness value to guide the search direction without the need of the fitness function continuous or instructive and other auxiliary information. Because the number of questions, the total score and other requirements in the initialization of the population has been considered, here only the knowledge point distribution with the difficulty factor to be considered. So the fitness function is related to the difficulty coefficient and knowledge point distribution of the test paper. Test paper difficulty coefficient formula: P=∑dixsi/∑si; where i=1,2, .... N,n is the number of questions contained in the test paper, Di,si is the difficulty coefficient and the fraction of the first question respectively. The distribution of knowledge points is measured by the coverage of an individual knowledge point, such as the expectation that this paper contains N points of knowledge, and the concentration of the knowledge points of all topics in an individual consists of M (m<=n), then the coverage of the knowledge points is m/n. The user's expected difficulty factor EP and the test paper difficulty coefficient p the smaller the better, the greater the coverage of knowledge points the better, so the fitness function is as follows:

F=1-(1-m/n) *f1-| Ep-p|*f2

The F1 is the weight of the distribution of knowledge points, and the F2 is the weight of the difficulty coefficient. When f1=0 is reduced to only limit the difficulty coefficient of the test, when f2=0 is degraded to restrict the distribution of knowledge points only.

Design of 2.3 Genetic operator

(1) Select operator. The function of the selection operator is to determine whether the next generation is eliminated or duplicated depending on the individual's merits and demerits. By choosing, the individuals with high degree of adaptability will have greater chances of survival. This system uses the roulette method, it is the most commonly used and the most classical choice method in the genetic algorithm at present. The concrete realization is: The scale is M of the group p each individual's adaptability is p={a1, A2 、... Am}, the probability of its being selected is: Ai/∑ai (i from 0 to M).

(2) crossover operator. Since the encoding is based on segmented real numbers, the cross-sectional single-point intersection (cross-sectional by topic) is used when crossing, and the whole chromosome behaves as a multipoint crossover. Crossover implementation process: The group of chromosomes arbitrarily 22 pairs, each pair of chromosomes produced a random number of [0, N-2] r,r that is the segmented point, the R after the two topics interchange (to ensure that the score added) to get the next generation. Descendants generated after the crossover may be illegal due to duplicate number of problems. In this case, replace the question number that appears in the section with a number that is not present in the paragraph so that the new descendant is re-obtained.

(3) Mutation operator. In genetic algorithms, the probability of mutation is generally small. The mutation is not segmented, but only one of the genes in a segment. The operation of mutation is as follows: randomly generate a mutation position p in [1,n] range, select a mutation gene from the question bank according to certain principle, the choice principle of the mutation gene is: the same as the original gene, the score is the same, with at least one effective knowledge point including the original topic (the expected test paper also has this knowledge point).

2.4 Implementation Flowchart

algorithm Implementation flow:

Figure 1 Algorithm implementation flowchart

2.5 Programming

Program Design Please see the next article: A case study on genetic algorithm--an automatic system based on Genetic algorithm "practice chapter"

Reference articles

a wave of n years: realization of Automatic group volume based on genetic algorithm

Baidu Encyclopedia: Genetic Algorithm  


A case study on genetic algorithm--the "theory" of Automatic group volume system based on genetic algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.