Python data structures and algorithms-algorithm analysis, data structure and algorithm analysis
In computer science,Algorithm Analysis(Analysis of algorithm) is the process of analyzing the number of computing resources required to execute a given algorithm (such as computing time and memory usage. The efficiency or complexity of an algorithm is theoretically expressed as a function. The definition field is the length of the input data. The value field is usually the number of execution steps (time complexity) or the number of memory locations (space complexity ). Algorithm analysis is an important part of computing complexity theory.
Address: Workshop.
An interesting problem often occurs, that is, two seemingly different programs. Which one is better?
To answer this question, we must know that the program differs greatly from the algorithm representing the program. the algorithm is a general command that solves the problem. provides a solution to any instance problem with specified input, and the algorithm produces the expected results. A program, on the other hand, implements an algorithm using a programming language code. many programs implement the same algorithm, depending on the use of programmers and programming languages.
Further explore this difference and examine the following function code. This function solves a simple problem, before CalculationNSum of natural numbers. The solution traverses thisNIntegers, which are assigned to the accumulators after being added.
def sumOfN(n): theSum = 0 for i in range(1,n+1): theSum = theSum + i return theSumprint(sumOfN(10))
Next, let's look at the following code. it looks strange at the first glance, but after a deep understanding, you will find that this function does the same work as the above function. T is because this function is not so obvious and the code is ugly. we did not use a good variable name, resulting in poor readability and declared variables that do not need to be declared.
def foo(tom): fred = 0 for bill in range(1,tom+1): barney = bill fred = fred + barney return fredprint(foo(10))
Which code is better? The answer to the question depends on your standards. If you only focus on readability, the functionsumOfNAffirmative RatiofooGood. In fact, you may have seen many examples of programs that teach you readability and ease of understanding in your programming lesson. However, here we are also interested in algorithms.
As an alternative space, we analyze and compare algorithms based on their execution time. This measurement is sometimes called the algorithm's "execution time" or "running time". We measuresumOfNOne method of function execution time is to make a benchmark analysis. In Python, we can use a function to mark the start and end time of the Program on the system we use.timeA module is calledtimeWill return the current time of the system. By calling this function twice, start and end, and then calculating the difference, we can get the exact execution time.
Listing 1
import timedef sumOfN2(n): start = time.time() theSum = 0 for i in range(1,n+1): theSum = theSum + i end = time.time() return theSum,end-start
Listing 1 showssumOfNTime overhead of the function before and after the sum. The test result is as follows:
>>>for i in range(5): print("Sum is %d required %10.7f seconds"%sumOfN(10000))Sum is 50005000 required 0.0018950 secondsSum is 50005000 required 0.0018620 secondsSum is 50005000 required 0.0019171 secondsSum is 50005000 required 0.0019162 secondsSum is 50005000 required 0.0019360 seconds
We found that the time was quite consistent and it took an average of 0.0019 seconds to execute the program. What if we increase n to 100,000?
>>>for i in range(5): print("Sum is %d required %10.7f seconds"%sumOfN(100000))Sum is 5000050000 required 0.0199420 secondsSum is 5000050000 required 0.0180972 secondsSum is 5000050000 required 0.0194821 secondsSum is 5000050000 required 0.0178988 secondsSum is 5000050000 required 0.0188949 seconds>>>
Again, the time is longer, very consistent, and the average time is 10 times.nWe increase to 1,000,000:
>>>for i in range(5): print("Sum is %d required %10.7f seconds"%sumOfN(1000000))Sum is 500000500000 required 0.1948988 secondsSum is 500000500000 required 0.1850290 secondsSum is 500000500000 required 0.1809771 secondsSum is 500000500000 required 0.1729250 secondsSum is 500000500000 required 0.1646299 seconds>>>
In this case, the average execution time is once again confirmed to be 10 times the previous time.
Now let's take a look at Listing 2 and propose a different method to solve the summation problem. This function,sumOfN3Using an equation: Σ ni = (n + 1) n/2nReplace cyclic computing with natural numbers.
Listing 2
def sumOfN3(n): return (n*(n+1))/2print(sumOfN3(10))
If we targetsumOfN3 Perform some tests and use 5 different n values (10,000,100,000, 1,000,000, 10,000,000, and 100,000,000). We get the following results:
Sum is 50005000 required 0.00000095 secondsSum is 5000050000 required 0.00000191 secondsSum is 500000500000 required 0.00000095 secondsSum is 50000005000000 required 0.00000095 secondsSum is 5000000050000000 required 0.00000119 seconds
There are two aspects to note about this output. First, the running time of the above program is shorter than the running time of any of the preceding programs. Second, the execution time is consistent regardless of n.
But what does this standard really tell us? Intuitively, we can see that the iterative solution seems to be doing more work because some program steps are repeated. This is the possible reason that it occupies more running time. When we increasenThe execution time of the loop scheme is also increasing. however, there is a problem. if we run the same function on different computers or use different programming languages, we may get different results. for older computersOn sumOfN3Execute more time.
We need a better way to describe the execution time of these algorithms. The baseline method calculates the actual execution time. It does not really provide us with a useful measurement because it depends on specific machines, current time, compilation, and programming languages. On the contrary, we need to have a feature that is independent of programs or computers. This method independently judges the algorithms used and can be used for algorithm comparison.
Example of a multipart Word Formation
An example of how to display algorithms of different orders of magnitude is the classic string ing problem. If the positions of one character string and another character string change only the positions of letters, it is called a ing. For example,'heart'And'earth'Each other is a transformer. String.'Python' and 'typhon'Yes. to simplify the discussion, we assume that the characters in the string are 26 English letters and the length of the two strings is the same. our goal is to write a boolean function to determine whether two given strings are mutually exclusive.
Method 1: one-by-one Detection
The first solution to the problem of location is to check whether each letter of the first string is in the second string. if all the letters are successfully detected, the two strings are translocated. after a letter is checked, the special value of Python will be used.None Replace. However, because the string in Python is immutable, the first step is to convert the string to list. See the following code:
def anagramSolution1(s1,s2): alist = list(s2) pos1 = 0 stillOK = True while pos1 < len(s1) and stillOK: pos2 = 0 found = False while pos2 < len(alist) and not found: if s1[pos1] == alist[pos2]: found = True else: pos2 = pos2 + 1 if found: alist[pos2] = None else: stillOK = False pos1 = pos1 + 1 return stillOKprint(anagramSolution1('abcd','dcba'))Method 2: sort and compare
Another solution is based on the idea that even if two stringss1Ands2Different, t they are translocated only when they contain exactly the same letter set. therefore, if we first sort the characters of the two strings in alphabetical order, if the two strings are translocated, we will get two identical strings. in Python, we can use the built-in list method.sortTo achieve simple sorting. See the following code:
def anagramSolution2(s1,s2): alist1 = list(s1) alist2 = list(s2) alist1.sort() alist2.sort() pos = 0 matches = True while pos < len(s1) and matches: if alist1[pos]==alist2[pos]: pos = pos + 1 else: matches = False return matchesprint(anagramSolution2('abcde','edcba'))
At first glance, you may think that the time complexity of the program is O (n), because there is only one simple loop that compares n letters. However, two calls to PythonsortThe overhead is not taken into account in the function. We will introduce later that sorting takes less time than O (n2) or O (nlogn), So sorting is dominant than loop.
Method 3: Brute Force
OneBrute forceThe counting method lists all possibilities. For this problem, we can uses1To generate all possible strings and view them.s2Whether or not. however, this method has a difficulty. we list all possibilities of s1, the first letter has n possibilities, the second position has n-1 possibilities, the third position has a N-2 ,....... The total possibility is: n * (n-1) * (n-1) * 3*2*1 = n !. N has been proved! The increment is very fast. When n is very large, n! The increment speed exceeds 2n.
Method 4: calculation and comparison
The last solution is based on the fact that any two translocations of strings have the same number of 'A' and the same number of 'B, the number of the same 'C ....... To determine whether two strings are transformed, we first calculate the number of times of each letter. because there are only 26 possible letters, we can use a list to save 26 counts, each of which saves possible letters. every time we see a special letter, we increase the corresponding count. finally, if the corresponding counts of the two lists are identical, the two strings are transposed. see the following code:
def anagramSolution4(s1,s2): c1 = [0]*26 c2 = [0]*26 for i in range(len(s1)): pos = ord(s1[i])-ord('a') c1[pos] = c1[pos] + 1 for i in range(len(s2)): pos = ord(s2[i])-ord('a') c2[pos] = c2[pos] + 1 j = 0 stillOK = True while j<26 and stillOK: if c1[j]==c2[j]: j = j + 1 else: stillOK = False return stillOKprint(anagramSolution4('apple','pleap'))
Still, this solution contains a large number of loops. however, unlike the first scheme, they are not embedded. the first two loop schemes calculate letters based on n. in the loop of the third solution, compare the number of counts in two strings. Only 26 steps are required because a string contains only 26 possible letters. in this example, we get step T (n) = 2n + 26. that is, O (n ). we have found a linear time solution to this problem.
Before leaving this example, we need to talk about the space overhead. although the final solution can run in a linear time, it must maintain the number of characters in two lists by using additional storage. In other words, the algorithm uses a space for time.
This is a common situation. In many cases, you need to make a trade-off between time and space. In the current situation, the amount of extra space is not significant. However, if the letters below contain millions of characters, you must pay more attention to the space overhead. As a computer scientist, when selecting algorithms, it is up to you to decide how to use computer resources to solve a specific problem.
You may also be interested in:
Python data structures and algorithms-object-oriented
Python data structures and algorithms-Data Types
Python BASICS (10)-Numbers
Python BASICS (9) -- Regular Expressions
Python BASICS (8)-Files
Python BASICS (7) -- Functions
Python BASICS (6) -- conditions and loops
Python BASICS (5)-dictionary
Python BASICS (4) -- strings
Python BASICS (3) -- list and metadata
Python BASICS (2) -- Object Type
Python BASICS (1) -- Python programming habits and features
Data Structure and algorithm analysis
The book is quite simple, but it is difficult to use it in practice. This is the most basic thing. It is the foundation for learning computers in the future, just like a freshman who wants to learn big things, is a basic course ~ As for how much you want to learn, it depends on your own requirements!
(1) Basic concepts and terms
1. Concept of Data Structure
2. Representation and implementation of abstract data structure types
3. algorithm, algorithm design requirements, algorithm efficiency measurement, and storage space requirements.
(2) linear table
1. Linear table Type Definition
2. Sequential Representation and implementation of linear tables
3. chained representation and implementation of linear tables
(3) stack and queue
1. Stack definition, representation, and implementation
2. Stack application: Number conversion, matching of parentheses, row editing, Maze solving, expression evaluation
3. Stack and Recursive Implementation
4. queue.
(4) String
1. String definition, representation, and implementation
2. String Pattern Matching Algorithm
(5) tree and binary tree
1. Tree definition and basic terms
2. Binary Tree, traversing binary tree and clue Binary Tree
3. Tree and forest: storage structure, binary tree conversion, Traversal
4. Hoffmann tree and Hoffmann code
5. backtracking and tree traversal
(6) Search
1. Static search table
2. dynamically search tables
3. Hash table
(7) figure
1. Graph definition and terminology
2. Graph Storage Structure
3. Graph Traversal
4. Graph connectivity problems
5. Topology Sorting and key paths
6. Shortest Path
(8) Internal sorting
1. Concept of sorting
2. Insert sorting
3. Quick sorting
4. Select sorting: Simple selection, tree selection, and heap sorting
5. Merge Sorting
6. Base sorting
7. Comparison of various sorting methods
Is Data Structure and algorithm analysis suitable for me?
The container is a vector in C ++. You should have learned the data structure in C language. You are now reading the data structure of the C ++ version. We recommend that you read the data structure of the Mechanical Industry Press (the C version) written by Ellis Horowiz and others in the United States, which is very classic, easy to understand