Python data structures and algorithms-algorithm analysis
An interesting problem often occurs, that is, two seemingly different programs. Which one is better? To answer this question, we must know that the program differs greatly from the algorithm representing the program. the algorithm is a general command that solves the problem. provides a solution to any instance problem with specified input, and the algorithm produces the expected results. A program, on the other hand, implements an algorithm using a programming language code. many programs implement the same algorithm, depending on the use of programmers and programming languages. further explore this difference and examine the following function code. this function solves a simple problem by calculating the sum of the First n natural numbers. solution: traverse the n integers and assign the values to the accumulators after adding them. def sumOfN (n): theSum = 0 for I in range (1, n + 1): theSum = theSum + I return theSum print (sumOfN (10 )) next, let's look at the following code. it looks strange at the first glance, but after a deep understanding, you will find that this function does the same work as the above function. T is because this function is not so obvious and the code is difficult. Check. we did not use a good variable name, resulting in poor readability and declared variables that do not need to be declared. def foo (tom): fred = 0 for bill in range (1, tom + 1): barney = bill fred = fred + barney return fredprint (foo (10 )) which code is better. the answer to the question depends on your criteria. if you only focus on readability, the function sumOfN must be better than foo. in fact, you may have seen many examples of programming programs that are both readable and easy to understand. however, we are also interested in algorithms. as an alternative space, we analyze and compare algorithms based on their execution time. this metric is sometimes called the algorithm's "execution time" or "running time ". one way to measure the execution time of the sumOfN function is to make a benchmark analysis. in Python, we can use a function to mark the start and end time of a program on the system we use. In the time module, a function called time will return the current time of the system. by calling this function twice, start and end, and then calculate the difference value, we can get an accurate execution time. listing 1 import time def sumOfN2 (n): start = time. time () theSum = 0 for I in range (1, n + 1): theSum = theSum + I end = time. time () return theSum, end-startListing 1 shows the time overhead of the sumOfN function before and after the sum. test results: >>> for I in range (5): print ("Sum is % d required % 10.7f seconds" % sumOfN (10000) Sum is 50005000 required 0.0018950 secondsSum is 50005000 required 0.0018620 secondsSum is 50005000 required 0.0019171 secondsSum is 50005000 required 0.0019162 secondsSum is 50005000 required 0.0019360 seconds. what if we increase n to 100,000? >>> For I in range (5): print ("Sum is % d required % 10.7f seconds" % sumOfN (100000 )) sum is 5000050000 required 0.0199420 secondsSum is 5000050000 required 0.0180972 secondsSum is 5000050000 required 0.0194821 secondsSum is 5000050000 required 0.0178988 secondsSum is 5000050000 required 0.0188949 seconds >>> the time is longer and consistent, the average time is 10 times. increase n to 1,000,000 and we reach: >>> for I in range (5): print ("Sum is % d required % 10.7f seconds "% sumOfN (1000000 )) sum is 500000500000 required 0.1948988 secondsSum is 500000500000 required 0.1850290 secondsSum is 500000500000 required 0.1809771 secondsSum is 500000500000 required 0.1729250 secondsSum is 500000500000 required 0.1646299 seconds >>> in this case, the average execution time was once again confirmed to be 10 times the previous time. now let's take a look at Listing 2 and propose a different method to solve the sum problem. this function, sumOfN3, uses an equation: Σ ni = (n + 1) n/2 to calculate the first n natural numbers instead of cyclic computing. listing 2 de F sumOfN3 (n): return (n * (n + 1)/2 print (sumOfN3 (10) If we do some tests for sumOfN3, using five different n values (10,000,100,000, 1,000,000, 10,000,000, and 100,000,000), we get the following results: sum is 50005000 required 0.00000095 secondsSum is 5000050000 required 0.00000191 secondsSum is 500000500000 required 0.00000095 secondsSum is 50000005000000 required 0.00000095 secondsSum is 5000000050000000 required 0.00000119 seconds has two requirements for this output. Note. first, the running time of the above program is shorter than that of any previous program. second, no matter how many times n is executed, it is consistent. but what does this standard really tell us? Intuitively, we can see that the iterative solution seems to be doing more work because some program steps are repeated. this is the reason why it occupies more running time. when we increase n, the execution time of the loop scheme is also increasing. however, there is a problem. if we run the same function on different computers or use different programming languages, we may get different results. for older computers, more time may be executed on sumOfN3. we need a better way to describe the execution time of these algorithms. The baseline method calculates the actual execution time. It does not really provide us with a useful measurement because it depends on specific machines, current time, compilation, and programming languages. On the contrary, we need to have a feature that is independent of programs or computers. This method independently judges the algorithms used and can be used for algorithm comparison. An example of a transformer word formation is a classic string transformer problem. if the position of one character string and another character string is changed, it is called a location. for example, 'heart' and 'global' are mutually exclusive. the strings 'python' and 'typhon' are also. to simplify the discussion, we assume that the characters in the string are 26 English letters and the length of the two strings is the same. our goal is to write a boolean function to determine whether two given strings are mutually exclusive. method 1: one-by-one detection of the problem of location, our first solution is to check whether each letter of the first string is in the second string. if all the letters are successfully detected, the two strings are translocated. after a letter is checked successfully, the special value "None" of Python will be replaced. however, because string is unchangeable in Python, the first step is to convert the string to list. see the following code: def anagramSolution1 (s1, s 2): alist = list (s2) pos1 = 0 stillOK = True while pos1 <len (s1) and stillOK: pos2 = 0 found = False while pos2 <len (alist) and not found: if s1 [pos1] = alist [pos2]: found = True else: pos2 = pos2 + 1 if found: alist [pos2] = None else: stillOK = False pos1 = pos1 + 1 return stillOK print (anagramSolution1 ('abcd', 'dcba ') Method 2: sort and compare another solution based on the following idea: even if the two strings s1 and s2 are different, t they are translocated only when they contain exactly the same letter set. therefore, if Sort the characters of the two strings in alphabetical order. If the two strings are translocated, we will get two identical strings. in Python, we can use the built-in list method sort to simply implement sorting. see the following code: def anagramSolution2 (s1, s2): alist1 = list (s1) alist2 = list (s2) alist1.sort () alist2.sort () pos = 0 matches = True while pos <len (s1) and matches: if alist1 [pos] = alist2 [pos]: pos = pos + 1 else: matches = False return matches print (anagramSolution2 ('abcde', 'edba') first glance, you may think that the time complexity of the program is O (n ), because there is only one simple ratio A loop with n letters. however, the overhead is not considered when the Python sort function is called twice. in the future, we will introduce that sorting takes less time than O (n2) or O (nlogn), So sorting is dominated by loops. method 3: The brute-force brute force counting method lists all possibilities. to solve this problem, we can use the s1 letter to generate all possible strings and check whether S2. however, this method has a difficulty. we list all possibilities of s1, the first letter has n possibilities, the second position has n-1 possibilities, the third position has a N-2 ,....... The total possibility is: n * (n-1) * (n-1) * 3*2*1 = n !. N has been proved! The increment is very fast. When n is very large, n! The increment speed exceeds 2n. method 4: Calculate and compare the last solution is based on the fact that any two transformed strings have the same number of 'A' and the same number of 'B, the number of the same 'C ....... To determine whether two strings are transformed, we first calculate the number of times of each letter. because there are only 26 possible letters, we can use a list to save 26 counts, each of which saves possible letters. every time we see a special letter, we increase the corresponding count. finally, if the corresponding counts of the two lists are identical, the two strings are transposed. run the following code: def anagramSolution4 (s1, s2): c1 = [0] * 26 c2 = [0] * 26 for I in range (len (s1 )): pos = ord (s1 [I])-ord ('A') c1 [pos] = c1 [pos] + 1 for I in range (len (s2 )): pos = ord (s2 [I])-ord ('A') c2 [pos] = c2 [pos] + 1 j = 0 stillOK = True while j <26 and stillOK: if c1 [j] = C2 [j]: j = j + 1 else: stillOK = False return stillOK print (anagramSolution4 ('apple', 'pleap ') Still, this solution contains a large number of loops. however, unlike the first scheme, they are not embedded. the first two loop schemes calculate letters based on n. in the loop of the third solution, compare the number of counts in two strings. Only 26 steps are required because a string contains only 26 possible letters. in this example, we get step T (n) = 2n + 26. that is, O (n ). we have found a linear time solution to this problem. before leaving this example, we need to talk about the space overhead. although the final solution can run in a linear time, it must maintain the number of characters in two lists by using additional storage. In other words, this algorithm uses space for time. This is a common situation. In many cases, you need to make a trade-off between time and space. In the current situation, the amount of extra space is not significant. However, if the letters below contain millions of characters, you must pay more attention to the space overhead. As a computer scientist, when selecting algorithms, it is up to you to decide how to use computer resources to solve a specific problem.