Algorithm design and data structure learning _ 3 (data structure and problem solving)

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

　　Preface:

This section is a 2nd-plus part of data structures and algorithm analysis in C ++ (second edition). It covers algorithm complexity analysis, standard template library introduction, recursive ideas, and algorithms, common sorting algorithms and analysis, random number generator and random algorithm.

　　Chap6:

The complexity of an algorithm depends on factors such as the size of the input data volume, the algorithm itself, the compiler optimization performance, the hardware performance of the running machine, and the algorithm itself. From these aspects, even if the two algorithms F and G have been determined, we cannot say that the running time of the two algorithms is always equal to F (n) <= g (n) or F (n)> = g (n ). one is that when n is relatively small, the difference between the two is too small to be felt, and the other is that when n is large to a certain extent, F (N) and g (N) the main item in (when it is a polynomial, It is the item with the highest value) can reflect the advantage, otherwise it will introduce a small error. Third, of course, is the compiler performance, CPU performance and other additional factors.

In actual use, O (N ^ 3) algorithms generally require thousands of input data, while O (n ~ 2) algorithms generally do not require over 10 million input data.

Common algorithm complexity from simple to complex tables:

The algorithm complexity indicates the number of addition, subtraction, multiplication, and Division operations between the numbers.

Even if the worst time complexity of an algorithm A is smaller than that of another algorithm B, it does not mean that the average complexity of A is smaller than that of B, although this is true in most cases.

The complexity of the sum of the maximum continuous sequences can be O (n), which is usually O (N ^ 2), or O (N ^ 3) or O (nlogn ).

Common Problems of log (n) are: Bit storage space required for consecutive integers; double increment; half decrease.

The search problem is the search problem. The static search algorithm does not change the original data during the search process. Common static search algorithms include sequential search (when data is not sorted) and binary search (sorted, used only when the search range is large, if the search range is small, it is better to change to sequential search ).

There are two common methods to check the complexity of an algorithm. The first is to increase the input to n times, depending on the output of time complexity. If it is also increased by N times, it may be linear or O (nlog (N). If it is increased by N ^ 2 times, it may be O (N ^ 2). If it is n ^ 3 times, it may be O (N ^ 3 ). The second method is to use the ratio method to divide the complexity of the actual algorithm by N, N ^ 2, nlog (N), N ^ 3, and increase the N value at the same time, the result of the ratio is 0, divergence, or constant. If it is a constant, the divisor represents the required time complexity.

　　Chap7:

Generally, data structures allow insertion operations, but some data structures may only allow insertion at characteristic locations (such as queues and stacks ). Common Data Structure operations include deletion and search.

The insert, remove, and find operations of the data structure in the stack correspond to push, pop, and top Operations respectively. Stack is much used in compiler design. Other common stack applications include symbol matching and mathematical operations.

The operations in the queue are enqueue, dequeue, and getfront, which are insert (after), delete (Before), and read (Before ).

An iterator is the original pointer type rather than the class type. Therefore, it does not have class member functions, but implements these functions through many Operator overloading.

Common Containers include vector, list, maps, and sets. Some containers allow replication operations, but some do not. All containers must provide empty (), begin (), end (), size (), getiterator () function. The iterator returned by the getiterator () function must have the hasnext () and next () functions, pointing to the corresponding container pointer and counting variable.

Generally, a reference is returned for operations on the container. The constant reference or normal reference must be performed to check whether the corresponding container is a constant container. Therefore, generally, containers have two types of iterators: const_iterator and iterator.

An iterator is a bridge between containers and algorithms. Containers provide the largest and most reasonable iterator, while algorithms require the smallest and most reasonable iterator.

In STL sort, the default sorting is from small to large. Predicate is a functor that meets certain attributes, and its return value is boolean. Data in the class cannot be changed. If this predicate has only one parameter, it is also called unary predicate.

The find_if () algorithm returns an iterator pointing to the first that satisfies the 3rd parameters in the find_if () function, that is, predicate.

The unary binder adapter is an adapter that converts a two-parameter function to a single parameter.

C ++ has almost all new solutions for pointer, such as replacing pass by pointer with reference, vector with array, string with char *, and STL containter with dynamic alloction.

The iterator of vector is pointer, And the iterator of list is not pointer, but the object uses Operator Overloading to make its surface operations look like pointer, but it is not a pointer.

Common sequence implementation methods in STL are arrays and linked lists. They all have push_back, insert, pop_up, and other methods, which have their own advantages.

In STL, stack and queue are implemented by calling appropriate functions through a sequence container (such as vector, list, And deque.

Set is a sequence container and its elements cannot be duplicated. Repeated elements are allowed in Multiset.

MAP is also an ordered container. The elements in it appear in the form of key-value pairs. A key corresponds to a value, and the key cannot be repeated. Multimap allows duplicate keys. MAP has common search, delete, and insert operations, where search and delete operations are for a pair, while find only needs to pass in the key value (the return value is still pair ). MAP has a unique operation: Index operation.

Priority Queues do not necessarily select elements in order, but give each object in the queue a serial number. For example, the smaller the serial number, the more important the corresponding object is. Therefore, the priority queue (Priority Queues) only allows access to objects with the smallest sequence value at a time, and it requires the deletemin and findmin methods. The priority queue allows repeated elements, so it cannot be implemented using set, but it can be implemented using Multiset, however, because Multiset implements a lot of operations that are not required by the priority queue, this implementation will be a waste of resources. In fact, the implementation of the priority queue adopts the binary heap format.

The algorithm in STL consists of the header file <algorithm>, <numeric>, and <funtional>. The <algorithm> header file is large and implements many common algorithms, such as comparison, exchange, search, traversal, and sorting. <Numeric> it is very small and only contains several template functions, addition and multiplication functions that supervise mathematical operations on sequences. <Functional> defines some templates for declaring function objects, such as common bind.

Container category:

Sequence containers: vector, deque, list, forward list, array;

Associative: Set, Multiset, MAP, multimap;

Unordered containers: unordered set/Multiset/MAP/multimap;

Because the associative is ordered, there are no push_back, pop_up, and other operations. Unordered containers is used with the hash function.

The data address inside the vector is continuous.

Set/Multiset, MAP/multimap are not allowed to assign values to the elements.

Common container adapters include stack, queue, and priority queue.

The pointer of Array Based containers (including the original pointer, iterator, reference, and so on) sometimes fails, because after data is inserted, these pointers may be random and random.

The iterator of the input and output can be forward or backward.

Common iterator types: insert iterator includes (back_insert_iterator, front_insert_iterator), stream iterator, reverse iterator, and more iterator (C ++ 11 ).

The variable in the template is not only used to indicate different data types, but also different numeric values. If it represents different numeric values, when using a template, you must replace an actual number.

If a functor is implemented in a class (class object is used as a function), the return value type should be placed before the keyword operator, which is the same as the operator overload, for example: void operator () (string Str) {}; if a type conversion is implemented in the class, put the keyword operator at the beginning, for example, operator string () const {return "X "};

Common functor in STL include less, greater, lower, less_equal, not_to _to, logical_and, logical_not, logical_or, multiplies, minus, plus, divide, modulus, and negate.

The parameters in the template must be determined within the compilation period.

Parameter binding ):

Suppose we need to multiply the elements in a set by 10 and save them to a vector:

Set <int> myset = {1, 2, 3, 4, 5 };

Vector: vector <int> VEC;

We can use the function template:

Outputiterator transform (inputiterator first1, inputiterator last1, outputiterator result, unaryoperation OP );

The first two parameters are the first and last iterator of the source data source, parameter 3 is the destination address iterator, and OP is The unary operation function. To implement the above functions, we can use the multiplies in funtor. Its usage is as follows: int x = multiplies <int> (2, 3), but the problem is that it requires two parameters, however, the last op of Tranform can only receive one parameter, so here we need to use the parameter blinding mechanism to convert the two parameter functions into a parameter function, using the template bind. The usage is as follows:

BIND (FN & FN, argS &... ARGs );

FN is the function, argS is the parameter or placeholder number, and the return value is also a function. The number of parameters varies depending on the actual situation.

The solution for multiplying the preceding set by a number is (back_inserter is an insert iterator ):

Transform (myset. Begin (), myset. End (), back_inserter (VEC), BIND (multiplies <int> (), placeholders: _ 1, 10 ));

BIND (multiplies <int> (), placeholders: _ 1, 10) returns a single-Parameter Function. Of course, the BIND function is only available in C ++ 11. In earlier versions (such as C ++ 03), only simple bind1st, bind2nd. in addition, these bind functions are template functions. If we need to convert normal functions into template functions, we can use the function template (C ++ 11, C ++ 03 is ptr_fun ).

Eg:

Double POW (Double X, Double Y ){

Return POW (x, y );

}

Auto F = function <double (double, double)> (POW );

　　Chap8:

Recursion is not a loop logic, so we should avoid falling into the loop logic when designing a recursive algorithm, because this is equivalent to an endless loop.

Some functions are better expressed by recursion than by a specific mathematical formula.

The four criteria for recursion: 1. There must be at least one base case, which does not need to be solved using recursion. 2. All recursion operations must be performed in the direction of base case. 3. we always think that calls in recursion will work normally, so we do not need to iterate in step to understand recursive Programs, because this understanding will be lengthy and error-prone, we only need to use mathematical induction to verify it once, and the other tasks will be handed over to the computer. 4. The synthesis effect rule never repeatedly calls the same interface during recursion.

The disadvantage of recursion is that it consumes a relatively large amount of memory (because the previous State needs to be saved in the recursion process), but some problems are more suitable for solving them by recursion, the time for solving the problem using recursive methods is only a little longer than that of non-recursive algorithms. The advantage is that the Code for Recursive processing is more compact.

Output each part of the number a in decimal form. This problem can be solved by recursive method starting from the high position. Each time the number after a single digit is removed is output, when only one bit is left, it is output directly.

Driver routine is widely used in recursion. Driver routine refers to the rationality of certain variables before the first call of recursion, because these variables do not change in the recursion process, and testing once is enough, after the test, you can directly call the recursive function.

Although the recursive algorithm is effective, it cannot be abused because of its large overhead. Therefore, we do not need to implement recursive algorithms in the work that can be completed in a simple loop.

Three examples of recursion: 1. N factorial; 2. Binary Search; 3. Draw a scale ruler;

If the modulo of the two number pairs is equal to that of the N, they are equivalent to the modulo of N, abbreviated as a ≡ B (mod N), that is, the same remainder.

Modular Exponentiation refers to the calculation of x ^ N (mod P ).

During the cup operation, it usually takes more time to multiply multiple decimal places than to multiply a few large numbers.

Extended Euclidean Algorithm:

For integers A and B whose values are not all 0, gcd (a, B) indicates the maximum approximate numbers of A and B, there must be an integer pair

X and Y, so that gcd (a, B) = AX + by, where X and Y are integers, of course, can be negative.

Inverse multiplication problem:

Calculate the X that satisfies the value of ax limit 1 (mod N), where a and n are known, and X is the multiplication inverse element of. This is equivalent to the sum of AX + ny = 1, which can be solved by combining the Extended Euclidean Algorithm and iteration ideas.

In the simple process of RSA algorithm, assume that Alice wants to send encrypted information to Bob:

First Bob selects two big prime numbers (more than 100 bits) p and q, then calculates n = PQ, n' = (PM) (q-1 ), finally, find a number E to make it intersect with N (multiple e may meet the condition), then calculate the multiplication inverse D of E, and finally Bob notifies Alice two numbers N and E, keep P, Q, and D on your own.

Assume that Alice intends to send the number m, then the number actually sent is r = m. ^ e (mod n), and the decoding process after bolb receives the number is R. ^ d (mod N ).

The RSA algorithm is slow in computing and encryption. It is a public key encryption method, and the general private key encryption speed is faster. For example, the DES algorithm can be combined with DES and RSA, that is, use des to encrypt the plaintext to be transmitted, and then use the RSA to encrypt the private key of the DES algorithm for transmission. The decryption end first decrypts the private key of DES and then decrypts the ciphertext.

Divide and conquer is an efficient recursive algorithm, which consists of two parts: 1. points, that is, each small problem can be obtained using a recursive algorithm. 2. In combination, the solution of the original problem can be composed of the solutions of some small problems, and the subproblems cannot overlap. However, the concept of divide-and-conquer is used to divide each small problem into half and add a linear time consumption. The time complexity of the algorithm is O (nlog (n )).

The change-making problem is the problem of minimum change quantity. It can be solved using a dynamic planning algorithm (dynamic planning is generally associated with greedy ideas and recursive ideas ).

Dynamic Planning divides the original problem into relatively simple subproblems. Each subproblem is solved only once, and then stored and obtained through the combination of subproblems, therefore, dynamic planning is especially suitable for subproblems with overlapping features.

Recursion is used in the three classic ideas: Divide and conquer, dynamic planning, and backtracking.

　　Chap9:

Many computer programs directly or indirectly use the sorting algorithm, and the Sorting Algorithm largely determines the algorithm running time.

The average time complexity required for sorting algorithms for exchanging adjacent elements is O (n ^ 2 ).

The lower limit of an algorithm complexity is much more complicated than the upper limit, because the lower limit of the algorithm complexity is abstract.

In Shell sorting, insert sorting is used inside each group. In Shell sorting, step selection is an important part of hill sorting. Any serial step can work as long as the final step is 1. The algorithm is first sorted in a certain step. Then, the algorithm will continue to sort by a certain step, and the final algorithm will sort by step 1. When the step size is 1, the algorithm changes to insert sorting, which ensures that the data will be sorted. The shell algorithm is also called the gap elimination algorithm, generally, the initial step size is half of the step size of the sort array, and then the step size is gradually halved (more often divided by 2.2 ).

Shell sorting has a characteristic that H (k-1)-sort after H (k)-sort does not change the sorting result of H (k)-sort.

The quick sorting algorithm is the fastest algorithm used in actual use. It adopts the recursive method, and the average time complexity is O (nlog (n )), the worst case is O (n ^ 2), but the probability of this worst case is very small in statistics. The performance of the Quick Sort Algorithm depends on the selected vertex (Central Point). In actual use, do not select the first or last number as the vertex each time, select the element in the middle or use the median-of-three method.

In the process of grouping with quick sorting, if the number and frequency are equal, the scan should be stopped. During recursion, when the number of remaining elements is small (for example, less than 10), insert sorting is better.

Select the number of K sizes in an array, which can be sorted first and then selected, but the cost will be relatively high, because we do not need to give so much information as we sort. You can obtain the Quick Selection Algorithm by slightly changing the quick sorting algorithm. The time complexity is O (n ).

The minimum time complexity of any sort algorithm based on comparison is O (log (N !)), About nlog (N), 1.44n.

Indirect sorting is used to reduce the copy overhead caused by template usage. It introduces a pointer array, which only applies to pointers instead of copying data.

From this chapter, we can see that insert sorting is suitable for small data volumes, shell sorting is suitable for moderate data volumes, and quick sorting is the most widely used, but it has many code skills.

　　Chap10:

Random number generator in modern cryptography, simulation systems, search and sorting algorithms, program testing.

Many distributions can be expanded from uniform distribution, so uniform distribution is very important.

Uniform numbers must meet important statistical characteristics: the sum of two consecutive numbers is equal to the probability of an even or odd number, and some elements can be repeated. Linear cool generators are often used to produce uniform distribution. X' = AX % m; A = 48271, m = 2 ^ 31 is a common combination because it can meet most application scenarios.

Poisson distribution is a discrete distribution, which is suitable for describing the probability distribution of the number of occurrences of random events per unit time. Poisson distribution can be generated from the uniform distribution. Assume that a Poisson distribution with the mean of M needs to be generated, the number of consecutive generation (0, 1) so that the product of log is less than or equal to negative m, then the number of uniform distributions meeting the condition can be used as a sample of Poisson distribution.

To generate a random sequence of n different numbers, you can first generate n ordered sequences, and then each time one of the data is randomly exchanged with the previous data.

In the worst case of a random algorithm, the time complexity is the same as that of a deterministic algorithm. You can combine a random algorithm with a sorting algorithm, for example, in fast sorting, the random algorithm can be used to select a shard, which can avoid the occurrence of bad input manually.

Try division to test the prime number N: All odd numbers between 3 and SQRT (n) cannot be divisible by N, then n is the prime number.

Using the Fermat's theorem, we can successfully obtain a method to determine whether a number is a prime number:

If any of the following B statements that satisfy 1 <B <P are true:

B (p-1) limit 1 (mod P)

P must be a prime number.

This theorem tells us to determine whether a number is a prime number. There is no need to test all natural numbers smaller than P, as long as we test all the prime numbers smaller than P. In fact, test all the prime numbers smaller than the square root of P.

When the random number problem is solved in C ++, it generally contains the following two header files:
# Include <cstdlib>
# Include <ctime>
Then the seed statement of the random number generator is srand (time (0), and the statement of the random number generator is rand ().

　　References:

Data Structures and algorithm analysis in C ++ (second edition), Mark Allen Weiss.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Algorithm design and data structure learning _ 3 (data structure and problem solving)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support