Common algorithm design method (7)-Divide and conquer

Source: Internet
Author: User

1. Basic Idea of divide and conquer Law
The computing time required for any problem that can be solved by a computer is related to its scale N. The smaller the problem, the easier it is to solve it directly, and the less computing time it takes to solve the problem. For example, for sorting of n elements, if n = 1, no calculation is required; if n = 2, the order can be sorted after a comparison; when n = 3, you only need to make three comparisons ,.... When N is large, the problem is not easy to handle. It is sometimes quite difficult to solve a large-scale problem directly.
The principle of divide and conquer law is to divide a big problem that is hard to solve directly into the same problems of small scale, so that they can be cracked and managed separately.
If the original problem can be divided into k sub-problems (1 <k ≤ n), and these sub-problems can be solved, you can use the solutions of these sub-problems to find the solution of the original problem, this method is feasible. Sub-problems produced by the Division and control method are often small models of the original problems, which provides convenience for the use of recursive technology. In this case, the sub-problem type can be consistent with the original problem type, but its scale is constantly reduced. In the end, the sub-problem can be easily solved directly. This naturally leads to the generation of recursive processes. Grouping and Recursion are like twins. They are often applied to algorithm design at the same time, and many efficient algorithms are generated.
2. governing conditions of the division and Control Law
The problems solved by the Division and control law generally have the following characteristics:
(1) The problem can be easily solved by narrowing down to a certain extent;
(2) The problem can be divided into several small-scale identical problems, that is, the problem has the optimal substructure;
(3) The solutions of subproblems decomposed by the problem can be merged into the solutions of the problem;
(4) Each subproblem identified by the problem is independent of each other, that is, the subproblem does not include a public subproblem.
The first feature above is that most problems can be met, because the computing complexity of the problem generally increases with the increase of the scale of the problem. The second feature is the premise of applying the divide and conquer method, this feature reflects the application of recursive thinking. The third feature is the key. Whether or not the divide and conquer method can be used depends on whether the problem has a third feature, if you have the first and second features, but do not have the third feature, you can consider greedy or dynamic programming. Article 4 features involve the efficiency of the Division and control law. If the sub-problems are not independent, the division and Control Law should do a lot of unnecessary work to solve public sub-problems repeatedly, at this time, although the divide and conquer method is available, it is better to use dynamic programming method.

Problem: Round Robin schedule
Problem description: There are n = 2 k players in the round-robin tennis tournament. A calendar meeting the following requirements must be designed:
(1) Each contestant must compete with the other n-1 contestants each time;
(2) Each contestant can only participate once a day;
(3) The Round Robin ends within n-1 days.
Follow these steps to create a table with N rows and n-1 columns. In row I of the table, set column J to the contestant on the day J. 1 ≤ I ≤ n, 1 ≤ j ≤ n-1.
Sharding policy, We All contestants can be divided into two halves, so the calendar of the N contestants can be determined through the calendar of the n/2 contestants. Recursively divide the contestants using this split strategy until only two players are left, the preparation of the competition schedule becomes very simple. In this case, you only need to let the two players compete.

1 2 3 4 5 6 7
1 2 3 4 5 6 7 8
2 1 4 3 6 7 8 5
3 4 1 2 7 8 5 6
1 2 3 4 3 2 1 8 5 6 7
1 2 3 4 5 6 7 8 1 4 3 2
1 2 1 4 3 6 5 8 7 2 1 4 3
1 2 3 4 1 2 7 8 5 6 3 2 1 4
2 1 4 3 2 1 8 7 6 5 4 3 2 1
(1) (2) (3)
Figure 1 schedules of two, four, and eight contestants
The square table (3) listed in figure 1 is the calendar of 8 contestants. The two small pieces in the upper left and lower left are the schedules of contestants 1 to 4, and contestants 5 to 8 three days before. In this way, all the numbers in the upper left corner are copied to the lower right corner according to their relative positions, and all the numbers in the lower left corner are copied to the upper right corner, in this way, we have arranged the schedules for contestants 1 to 4 and 5 to 8 in the next 4 days. In this way, it is easy to promote this competition schedule to a situation where there are any number of contestants.

3. basic steps of divide and conquer Law
There are three steps in recursion of each layer:
(1) decomposition: the original problem is divided into several subproblems, which are small in size and independent from each other and form the same as the original problem;
(2) solution: If the subproblem is small and easy to solve, the subproblem will be solved directly; otherwise, the subproblem will be solved recursively;
(3) Merge: Merge the solutions of each subproblem into the solutions of the original problem.
Its general algorithm design pattern is as follows:
Divide_and_conquer (P)
If | p | ≤n0
Then return (Adhoc (p ))
Split P into smaller subproblems P1, P2 ,... , PK
For I want 1 to K
Do
Yi weidivide-and-conquer (PI) △recursive pi
T merge Merge (Y1, Y2 ,..., Yk) △merge subproblem
Return (t)
| P | indicates the scale of the problem P; N0 is a threshold value, indicating that when the scale of the problem P does not exceed N0, the problem is easily solved and does not need to be further decomposed. Adhoc (P) is the basic sub-algorithm in the Division and control method, and is used to directly solve small-scale problem P. Therefore, when the size of P cannot exceed N0, the algorithm adhoc (P) is used for solving the problem.
Algorithm Merge (Y1, Y2 ,..., Yk) is the merge sub-algorithm in the Division and control method. It is used to split p sub-problems P1, P2 ,... , PK corresponding solutions Y1, Y2 ,... Merge yks into P.
According to the Division Principle of the division and Control Law, how many sub-problems should the original question be divided? How can we make proper sub-questions? These questions are hard to be answered. However, people have found from a large number of practices that it is best to make the size of sub-problems roughly the same when designing algorithms using the divide and conquer method. In other words, it is effective to divide a problem into k sub-Problems of the same size. Many problems can be solved by K = 2. This approach to roughly equal the size of sub-problems comes from the idea of balancing sub-problems, which is almost always better than that of sub-problems of varying sizes.
The merge steps of the divide and conquer method are the key to the algorithm. The merge methods for some problems are obvious, while the merge methods for some problems are complicated, or there are multiple merge schemes, or the merge schemes are not obvious. There is no uniform mode for how to merge data. Specific analysis is required.

 

[Problem] Big integer multiplication
Problem description:
When analyzing the computing complexity of an algorithm, addition and multiplication operations are considered as basic operations, the computing time required to execute an addition or multiplication operation is considered as a constant that only depends on the processing speed of computer hardware.
This assumption is reasonable only when the computer hardware can directly represent and process the integers involved in the operation. However, in some cases, a large integer cannot be processed within the range directly indicated by computer hardware. If you use a floating point to represent it, it can only be approximately the size of it, and the valid number in the calculation result is also limited. To accurately represent a large integer and obtain a number in all digits in the calculation results, you must use a software method to perform arithmetic operations on the large integer.
Please design an effective algorithm that can be used for multiplication of two n-bit big integers.
If both X and Y are n-bit binary integers, calculate their product XY. We can design an algorithm to calculate the product XY using the method learned in primary school. However, the calculation process is too much and the efficiency is low. If we consider the multiplication or addition of every two 1-digit numbers as one-step operations, This method requires O (n2) steps to obtain the product XY. Next we will use the divide and conquer method to design a more effective big integer product algorithm.

Figure 6-3 segments of big integers x and y
We divide the n-bit binary integers x and y into two segments, each of which is n/2 (for simplicity, assume that N is the power of 2), as shown in 6-3.
Therefore, x = a2n/2 + B, y = c2n/2 + D. In this way, the product of X and Y is:
XY = (a2n/2 + B) (c2n/2 + d) = ac2n + (AD + CB) 2N/2 + BD (1)
If the formula (1) is used to calculate XY, we must perform 4 n/2 integer multiplication (AC, AD, BC, and BD ), and three integer addition (corresponding to the plus sign in formula (1) respectively) that does not exceed N bits, and two shifts (corresponding to formula (1) respectively) multiplication 2n and multiplication 2n/2 ). All these addition and shift operations share the O (n) step. If T (n) is the total number of operations required to multiply two n-digit integers, then formula (1) has:
(2)
Therefore, T (n) = O (n2) can be obtained ). Therefore, using the (1) formula to calculate the product of X and Y is not more effective than the primary school method. To improve the computing complexity of an algorithm, you must reduce the number of multiplication times. For this reason, we write XY as another form:
XY = ac2n + [(A-B) (D-C) + AC + bd] 2n/2 + BD (3)
Although formula (3) looks more complex than formula (1), it only requires multiplication of 3 n/2 integers (AC, BD and (A-B) (D-C), 6 addition, subtraction and 2 shift. Therefore, you can obtain the following information:
(4)
The formula T (n) = O (nlog3) = O (n1.59) can be obtained immediately ). Using formula (3) and considering the effect of the symbols X and Y on the result, we provide the complete algorithm mult for multiplying large integers as follows:
Function mult (X, Y, n); {X and Y are two integers less than 2n, And the return result is the product XY of x and y}
Begin
S = Sign (x) * sign (y); {s is the symbol product of x and y}
X = ABS (X );
Y = ABS (y); {take absolute values of X and Y respectively}
If n = 1 then
If (x = 1) and (y = 1) then return (s)
Else return (0)
Else begin
N/2 digits on the left of a = X;
B = n/2 digits on the Right of X;
C = n/2 places on the left of Y;
D = n/2 places on the Right of Y;
ML = mult (A, C, n/2 );
M2 = mult (A-B, D-C, n/2 );
M3 = mult (B, d, n/2 );
S = S * (m1 * 2n + (M1 + M2 + m3) * 2n/2 + m3 );
Return (s );
End;
End;
The preceding binary big integer multiplication can also be applied to the multiplication of large decimal integers to improve the multiplication efficiency and reduce the multiplication times. [Problem] closest point to Problem
Problem description:
In applications, simple geometric objects such as vertices and circles are commonly used to represent entities in the real world. In terms of these geometric objects, it is often necessary to know the information of other geometric objects in the neighborhood. For example, in air traffic control, if an airplane is viewed as a point of movement in space, the two planes with the greatest Collision danger are the closest points in the space. This type of problem is one of the basic issues in computational geometry. Next we will focus on the closest point on the plane.
The closest point-to-point problem refers to the problem of finding a pair of points on the given plane to minimize the distance between all the points on the N points.
Strictly speaking, the closest pair may be more than one. For the sake of simplicity, we only need to find one of them.
This problem is easy to understand and can be easily solved. We only need to calculate the distance between each point and other n-1 points and find the two points that reach the minimum distance. However, this is too inefficient and requires O (n2) computing time. Can we find an O (nlogn) algorithm of the problem.
This problem clearly meets the first and second conditions of the division and Control Law. We consider dividing the set S of N points on the given plane into two subsets S1 and S2, each subset has about n/2 points, and then recursively finds the closest point in each subset. Here, a key question is how to implement the merge steps in the Division and Control Law, that is, how to obtain the closest point in the original set S from the closest point of S1 and S2, because the closest point of S1 and S2 is not necessarily the closest point of S. If the two closest points of S are both in S1 or S2, the problem is easily solved. However, if the two vertices are in S1 and S2 respectively, then for any point P in S1, S2 only has n/2 vertices and it forms the closest vertex candidate, the closest point of s can be determined only after N2/4 computation and comparison. Therefore, it takes O (n2) to merge the steps ). The computing time t (n) required by the entire algorithm should meet the following requirements:
T (n) = 2 T (n/2) + O (n2)
It is interpreted as T (n) = O (n2), that is, the time consumed by the merge step is the same level, and it does not show better than the exhaustive method. From the application of the formula for solving recursive equations, we can see that the merge process takes too much time. This inspires us to focus on the merging steps.
To make the problem easy to understand and analyze, we should first consider one-dimensional situations. In this case, the N points in S are degraded to n real numbers x1, x2,… on the X axis ,... , Xn. The closest point is the two real numbers with the smallest difference among the n real numbers. Obviously, we can set x1, x2 ,... And XN sort the order, and then use a linear scan to find the closest point. This method mainly calculates the time spent on sorting. Therefore, as demonstrated in the sorting algorithm, the time consumed is O (nlogn ). However, this method cannot be directly applied to two-dimensional scenarios. Therefore, we still try to solve this one-dimensional simple situation by using the partitioning method, and hope to promote it to the two-dimensional situation.
Suppose we divide s into two subsets S1 and S2 by a certain M point on the X axis, so that S1 = {x ε s | x ≤ m }; s2 = {x ε s | x> m }. In this way, P <q is available for all P _ S1 and Q _ s2.
Recursively locate the closest point to {p1, p2} and {Q1, Q2} on S1 and S2, and set Delta = min {| p1-p2 |, | q1-q2 | }, in S, the closest vertex is {p1, p2}, {Q1, Q2}, or a {P3, Q3}, where P3, S1, and Q3, S2. 1.

Figure 1 division and control of one-dimensional situations
We note that if the closest point of S is {P3, Q3}, that is, | p3-q3 | <delta, then the distance between P3 and Q3 and M is not greater than delta, that is | p3-m | <delta, | q3-m | <delta, that is to say, P3 (m-delta, m), Q3 (M, M + delta ). In S1, each semi-closed interval with a length of Delta contains at most one point (otherwise, there must be a two-point distance less than delta), and m is the split point between S1 and S2, therefore, (m-delta, m) contains at most one point in S. Similarly, (m, M + delta) also contains at most one point in S. As shown in figure 1, if (m-delta, m) has a vertex in S, this vertex is the largest vertex in S1. Similarly, if (M, M + delta) has a point in S, this point is the smallest point in S2. Therefore, we can use linear time to find all the points in the range (m-delta, M) and (M, M + delta), that is, P3 and Q3. So we can use linear time to merge the S1 solution and the S2 solution into the s solution. That is to say, the merging step can be completed within O (n) time according to this sub-governance policy. In this way, can we get an effective algorithm?
Another problem needs to be carefully considered, that is, the selection of Split points M and the division of S1 and S2. A basic requirement for selecting the split point M is to export a linear split of the Set S, that is, S = S1 S2. S1 S2, and S1 {x | x ≤ m}; S2 {x | x> m }. It is easy to see that if M = [Max (s) + min (s)]/2 is selected, linear segmentation can be met. After selecting the split point, use O (n) Time to divide s into S1 = {x ε s | x ≤ m} and S2 = {x ε s | x> m }. However, selecting the split point M may result in imbalance between the split subsets S1 and S2. For example, in the worst case, | S1 | = 1, | S2 | = n-1, the resulting division and control method requires the computing time t (n) in the worst case) recursive equations should be met:
T (n) = T (n-1) + O (N)
The solution is T (n) = O (n2 ). This reduction in efficiency can be solved by balancing sub-Problems in the sub-governance method. That is to say, we can select the appropriate split point M to make S1 and S2 have roughly the same number of points. Naturally, we will think of using the median of the N points of s as the split point. The linear time Algorithm for selecting median introduced in the Selection Algorithm enables us to determine a balanced split point M in the O (n) time.
So far, we can design an algorithm pair for finding the closest point of a one-dimensional point set S.
Float pair (s );
{If | S | = 2 Delta = | x [2]-X [1] |/* X [1 .. n] coordinates of N points in S */
Else
{If (| S | = 1) Delta = ∞
Else
{M = the median of the coordinate values of each point in S;
Construct S1 and S2 so that S1 = {x ε s | x ≤ m}, S2 = {x ε s | x> m };
Delta 1 = pair (S1 );
Delta 2 = pair (S2 );
P = max (S1 );
Q = min (S2 );
Delta = min (DELTA 1, Delta 2, q-P );
}
Return (DELTA );
}
The analysis above shows that the splitting and merging steps of the algorithm are time-consuming O (n ). Therefore, the algorithm's computing time t (n) satisfies the recursive equation:

To solve this recursive equation, T (n) = O (nlogn) can be obtained ).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.