Dynamic Programming (DP)-Dynamic Planning

Source: Internet
Author: User

I. concept and significance

Dynamic Programming is a branch of operational research and a mathematical method to optimize the decision process. In early 1950s, the American mathematician R. e. bellman and others proposed the famous principle of optimization (principle of optimality) when studying the optimization problem of the multi-stage decision process (multistep demo-process ), converting a multi-stage process into a series of single-stage problems, solving them one by one, and creating a new method to solve such process optimization problems-dynamic planning. He published his masterpiece dynamic programming in 1957, the first book in this field.
Since the advent of dynamic planning, it has been widely used in economic management, production scheduling, engineering technology and optimal control. For example, the shortest path, inventory management, resource allocation, device update, sorting, and loading are more convenient to use dynamic planning than other methods.
Although dynamic planning is mainly used to solve the problem of optimization of dynamic processes in Time Division stages, some static planning unrelated to time (such as linear planning and nonlinear planning), as long as time factors are artificially introduced, it is regarded as a multi-stage decision-making process, and can also be easily solved using dynamic planning methods.
Dynamic Programming is a way and method to solve the optimization problem, rather than a special algorithm. Unlike the search or numeric calculations described above, there is a standard mathematical expression and a clear and clear solution. Dynamic Programming is usually aimed at an optimization problem. Due to the different nature of various problems, the conditions for determining the optimal solution are also different. Therefore, the design method of dynamic planning is different for different problems, there are different solutions, but there is no universal dynamic planning algorithm, can solve all kinds of optimization problems. Therefore, in addition to a correct understanding of the basic concepts and methods, readers must analyze and handle specific problems, build models with rich imagination, and use creative techniques to solve them. We can also analyze and discuss Dynamic Planning Algorithms for several representative problems, and gradually learn and master this design method.

Ii. Basic Model

Optimization of the multi-stage decision-making process.
In real life, there is a kind of activity process. Due to its particularity, the process can be divided into several interrelated stages, and decision-making is required at each stage of the process, so that the entire process achieves the best activity effect. Of course, the decision-making in each stage is not randomly determined. It depends on the current status and affects future development. When the decision-making in each stage is determined, it forms a decision sequence, therefore, an activity route for the entire process is determined. (view the entry chart)
This kind of problem is regarded as a multi-stage process with a chain structure, which is called a multi-stage decision-making process. This problem is called a multi-stage decision-making problem.


Iii. Memory-based search

Here is a digital triangle:
1
2 3
4 5 6
7 8 9 10
Find a path from the first layer to the last layer to minimize or maximum the sum of weights.

No matter whether you are new or old-fashioned, you are not familiar with this question. It is very easy for usWrite state transition equation: f (I, j) = A [I, j] + min {f (I-1, j) + f (I-1, J + 1 )}
To solve this problem by using the dynamic planning algorithm, we can easily write a cyclic Representation Method for Dynamic Planning Based on the state transition equation and state transfer direction. However, when the status and transfer are very complex, it may be easier to write a circular dynamic plan.
Solution:

We try to analyze the problem from a positive idea. In the above example, it is not difficult to come up with a very simpleRecursive Process:
F1: = f (I-1, J + 1); F2: = f (I-1, J );
If F1> F2 then F: = F1 + A [I, j] else F: = F2 + A [I, j];
Obviously, this algorithm is the simplest Search Algorithm. The time complexity is 2n, which obviously times out. After analyzing the search process, in fact, many calls are unnecessary, that is, the generated optimal state is generated again. To avoid waste, it is clear that we store an opt array:OPT [I, j]-each time an f (I, j) is generated, the value of f (I, j) is put into OPT, and then called to f (I, j) again) you can use OPT [I, j] directly.Therefore, the state transition equation of dynamic planning is intuitively expressed, which saves the difficulty of thinking and reduces programming skills. The running time is only the complexity of the variance constant, in addition, in many cases, recursive algorithms can better avoid waste and are very practical in the competition.
Iv. status decision-making

Decision Making:

The current status is returned to the previous status through decision making. It can be seen that decision making is actually a bridge between States. The previous status determines the current status. The decision of a digital triangle is to select the optimal values of two adjacent previous states.

Status:

We generally use some arrays during Motion Planning, that is, they are used to store the optimal values of each State. We start from the essentials of dynamic planning, that is, the "status" of the core part, to gradually understand dynamic planning.Sometimes, if the current status is determined and the previous status is determined, no enumeration is required.

5. Application of Dynamic Planning Algorithms

I. Concept of Dynamic Planning
In recent years, there have been more and more competitions involving dynamic planning. Almost all Noi questions each year need to be solved by dynamic planning; the competition has higher and higher requirements for contestants to use dynamic planning knowledge, and is no longer stuck in simple recursion and modeling.
To understand the concept of dynamic planning, we must first know what is a multi-stage decision-making problem.
1. Multi-stage decision-making
If a type of activity process can be divided into several interrelated stages, decision-making (taking measures) is required at each stage. After Decision-Making at one stage is determined, decision-making at the next stage is often affected, this completely determines the activity route of a process, which is called a multi-stage decision-making problem.
Decisions at various stages constitute a decision sequence, called a strategy. Each stage has several decisions to choose from. Therefore, many policies are available for us to select. corresponding to a policy, we can determine the effect of an activity. This effect can be determined by quantity. Different policies have different effects. The multi-stage decision-making problem is to select an optimal policy among the policies that can be selected to achieve the best effect under the predefined standard.
2. Terminology in Dynamic Planning
Phase:
The process of solving the problem is appropriately divided into several interrelated stages to facilitate the solution. Different Stages may vary depending on the process. variables in the description stage are called stage variables. In most cases, the stage variables are discrete and expressed in K. In addition, stage variables are continuous. If a process can make a decision at any time, and an infinite number of decisions can be made at any two different times, the stage variables are continuous.
In the previous example, the first stage is point A, the second stage is point A to point B, and the third stage is point B to point C, the fourth stage is point C to point D.
Status:Status indicates the natural or objective conditions at the beginning of each stage. It is not transferred by the subjective will of people, also known as uncontrollable factors. In the above example, the State is the starting position of a stage. It is both the starting point of a path in the stage and the end point of a branch in the previous stage.
In the previous example, the first stage has a State A, while the second stage has two States B1 and B2. The third stage has three States C1, C2, and C3, the fourth stage is a State D.
The state of a process can usually be described by one or more sets of values, which are called state variables. In general, the State is discrete, but sometimes it is obtained as continuous for convenience. Of course, in real life, due to the limitation of the form of variables, all States are discrete, but from the perspective of analysis, sometimes state as continuous processing will be of great benefit. In addition, the State can have multiple components (multidimensional situations), so they are represented by vectors. The State dimensions in each stage can be different.
When the process develops in all possible ways, the state variables of each segment of the process will be within a certain range. A set of state variables is called a set of States.
No aftereffect:We require the State to have the following properties: if the state of a stage is given, the development of the process after this stage will not be affected by the status of each segment before this stage, when all stages are determined, the entire process is determined. In other words, each implementation of a process can be represented by a state sequence. In the previous example, the state of each stage is the starting point of the line. Once the sequence of these points is determined, the entire line is completely determined. Starting from a line after a certain stage, when the starting point of this section is scheduled, it is not affected by the previous line (the point through. The nature of the State means that the history of the process can only influence its future development through the current state. This nature is called "No aftereffect.
Decision Making:After a stage state is given, an option (Action) that evolves from this state to a certain State in the next stage is called decision-making. In optimal control, it is also called control. In many questions, a decision can naturally represent a number or a group of numbers. Different decisions correspond to different values. A decision-making variable is called a decision-making variable. because the State has no aftereffect, you only need to consider the current state when selecting a decision at each stage without considering the history of the process.
The range of decision variables is the set of permitted decisions.
Policy:A sequence composed of decisions at each stage is called a policy. For each actual multi-stage decision-making process, a certain range of policies can be selected. This range is called a set of permitted policies. The policy that allows the set of policies to achieve the optimal effect is calledOptimal Strategy.
Given the value of K stage state variable X (K), if the decision variable of this stage is determined, the state variable X (k + 1) of K + 1) that is, the value of X (k + 1) changes with the value of the Decision U (k) in the X (K) and K phase, we can regard this relationship as the correspondence between (x (K), u (k) and X (k + 1), and use X (k + 1) = TK (x (K), u (k. This is the state transfer rule from K stage to k + 1 stage, calledState transition equation.
Optimum principle:
As the optimal strategy for the entire process, it satisfies the following requirements: the remaining sub-policies must constitute an "optimal sub-policy" relative to the status formed by the previous decision ".
The principle of adequacy is actually that the sub-policies that require the optimal policy of the problem are also optimal. Let's further analyze the preceding example to illustrate this point: from A to D, we know that the shortest path is a & o1664; B1 & o1664; C2 & o1664; D, the selection of these points constitutes the optimal policy for this example. According to the principle of the optimum, each sub-policy of this policy should be optimal: A & o1664; B1 & o1664; c2 is the shortest path from A to C2. B1 & o1664; C2 & o1664; D is also the shortest path from B1 to d ...... This is exactly the case. Therefore, we believe that this example meets the requirements of the optimum principle.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.