Python provides a summary of sample code for various sorting algorithms,
In Python practice, we often encounter sorting problems, such as ranking search results (there is no Google or other search engines if there is no sorting). Of course, there are countless such examples. "Data Structure" also takes a lot of space to explain sorting. Some time ago, due to the need, I reviewed the Sorting Algorithm and used Python to implement various sorting algorithms. This is here for reference.
There are three simplest sorting types: insert sorting, and select sort and Bubble sorting. These three sorts are relatively simple, and their average time complexity is O (n ^ 2). I will not describe the principle here. Paste the source code.
Insert sorting:
def insertion_sort(sort_list): iter_len = len(sort_list) if iter_len < 2: return sort_list for i in range(1, iter_len): key = sort_list[i] j = i - 1 while j >= 0 and sort_list[j] > key: sort_list[j+1] = sort_list[j] j -= 1 sort_list[j+1] = key return sort_list
Bubble Sorting:
def bubble_sort(sort_list): iter_len = len(sort_list) if iter_len < 2: return sort_list for i in range(iter_len-1): for j in range(iter_len-i-1): if sort_list[j] > sort_list[j+1]: sort_list[j], sort_list[j+1] = sort_list[j+1], sort_list[j] return sort_list
Select sort:
def selection_sort(sort_list): iter_len = len(sort_list) if iter_len < 2: return sort_list for i in range(iter_len-1): smallest = sort_list[i] location = i for j in range(i, iter_len): if sort_list[j] < smallest: smallest = sort_list[j] location = j if i != location: sort_list[i], sort_list[location] = sort_list[location], sort_list[i] return sort_list
Here we can see the sentence:
Sort_list [I], sort_list [location] = sort_list [location], sort_list [I]
If you do not know Python, you may find it strange. Yes, this is a practice of exchanging two numbers. In other languages, if you want to exchange values of a and B, you often need an intermediate variable temp, first, assign a to temp, then assign B to a, and then assign temp to B. But in python, you can write like this: a, B = B, a, in fact, this is because the left and right sides of the value assignment symbol are tuples (it should be emphasized that in python, tuples are defined by commas (,) instead of parentheses ).
Algorithms with an average time complexity of O (nlogn) include Merge Sorting, heap sorting, and fast sorting.
Merge and sort. For a sub-sequence, divide the sub-sequence into two parts and compare the first element of the two parts. The sub-sequence pops up and repeats the process. For columns to be sorted, the median value is divided into two sequences: left and right, and then each subsequence is called recursively. The source code is as follows. The callable class is written as a tool function:
class merge_sort(object): def _merge(self, alist, p, q, r): left = alist[p:q+1] right = alist[q+1:r+1] for i in range(p, r+1): if len(left) > 0 and len(right) > 0: if left[0] <= right[0]: alist[i] = left.pop(0) else: alist[i] = right.pop(0) elif len(right) == 0: alist[i] = left.pop(0) elif len(left) == 0: alist[i] = right.pop(0) def _merge_sort(self, alist, p, r): if p<r: q = int((p+r)/2) self._merge_sort(alist, p, q) self._merge_sort(alist, q+1, r) self._merge(alist, p, q, r) def __call__(self, sort_list): self._merge_sort(sort_list, 0, len(sort_list)-1) return sort_list
Heap sorting is based on the data structure-heap. The basic concepts of heap and the storage method of heap are not described here. Here we use a list to store the heap (similar to Array Storage). For elements in the I position, the 2i + 1 position is its left child, and 2i + 2 is its right child, similarly, the parent element of the element can be obtained.
First, we write a function. For a subtree, if its value is smaller than the value of the subnode from the root node, the value is exchanged. This method is used to recursion the subtree. Next, we call the previously described functions from the bottom up for all non-leaf nodes in the heap to get a tree. For each node (non-leaf node), it is larger than its subnodes. (In fact, this is the process of creating the largest heap.) After completing the process, change the sequence of the header and tail elements of the list. In this way, the last bit of the list is the largest number, then, call the above process to create the maximum heap in the 0-n-1 part of the list. Finally, the list of completed heap sorting is displayed. The source code is as follows:
class heap_sort(object): def _left(self, i): return 2*i+1 def _right(self, i): return 2*i+2 def _parent(self, i): if i%2==1: return int(i/2) else: return i/2-1 def _max_heapify(self, alist, i, heap_size=None): length = len(alist) if heap_size is None: heap_size = length l = self._left(i) r = self._right(i) if l < heap_size and alist[l] > alist[i]: largest = l else: largest = i if r < heap_size and alist[r] > alist[largest]: largest = r if largest!=i: alist[i], alist[largest] = alist[largest], alist[i] self._max_heapify(alist, largest, heap_size) def _build_max_heap(self, alist): roop_end = int(len(alist)/2) for i in range(0, roop_end)[::-1]: self._max_heapify(alist, i) def __call__(self, sort_list): self._build_max_heap(sort_list) heap_size = len(sort_list) for i in range(1, len(sort_list))[::-1]: sort_list[0], sort_list[i] = sort_list[i], sort_list[0] heap_size -= 1 self._max_heapify(sort_list, 0, heap_size) return sort_list
The last type of switching Sorting Algorithm (all of the above algorithms are switching sorting, because they all need to be compared and exchanged by comparison) is naturally a classic fast sorting algorithm.
Let's first explain the principles. The first thing we need to use is the partition tool function (partition). For the given list (array), we first select the benchmark element (here I select the last element), through comparison, make the position of the element so that all the numbers on the left of the baseline element in the new list (run locally) after the end of the operation are smaller than that on the baseline element, and the number on the right is greater than that on the baseline element. Then we use the Partition Function to obtain the location of the list to be sorted, divide the List into two lists (ideally), and call the Partition Function recursively, until the length of the sub-sequence is less than or equal to 1.
The source code for quick sorting is as follows:
class quick_sort(object): def _partition(self, alist, p, r): i = p-1 x = alist[r] for j in range(p, r): if alist[j] <= x: i += 1 alist[i], alist[j] = alist[j], alist[i] alist[i+1], alist[r] = alist[r], alist[i+1] return i+1 def _quicksort(self, alist, p, r): if p < r: q = self._partition(alist, p, r) self._quicksort(alist, p, q-1) self._quicksort(alist, q+1, r) def __call__(self, sort_list): self._quicksort(sort_list, 0, len(sort_list)-1) return sort_list
Careful friends may find a problem here. If the column to be sorted is in order, the entire recursion will reach the maximum recursive depth (sequence length ). In fact, during the operation, when the list length is greater than 1000 (theoretical value), the program will be interrupted and an error will be reported that exceeds the maximum recursive depth (maximum recursion depth exceeded ). After checking the information, we know that by default, the maximum recursive depth of Python is 1000 (the theoretical value, in fact, is only about 995, the values of different systems are also different ). There are two solutions to this problem: 1) reset the maximum recursive depth and use the following method to set it:
import syssys.setrecursionlimit(99999)
2) The second method is to use another version of the partition function, called the Randomization partition function. Since our previous selection was the last number of subsequences, the robustness in special cases is much worse. Now we randomly select the reference element from the subsequence to reduce the error rate in special cases. The new randomize partition function is as follows:
def _randomized_partition(self, alist, p, r): i = random.randint(p, r) alist[i], alist[r] = alist[r], alist[i] return self._partition(alist, p, r)
The complete randomize_quick_sort code is as follows (here I inherit the previous quick_sort class ):
import randomclass randomized_quick_sort(quick_sort): def _randomized_partition(self, alist, p, r): i = random.randint(p, r) alist[i], alist[r] = alist[r], alist[i] return self._partition(alist, p, r) def _quicksort(self, alist, p, r): if p<r: q = self._randomized_partition(alist, p, r) self._quicksort(alist, p, q-1) self._quicksort(alist, q+1, r)
The discussion on quick sorting is not over yet. We all know that Python is a very elegant language, and the code written by Python is quite concise and highly readable. Here we will introduce another way of writing quick sort, which can be done in just three lines, but with no loss of readability. (Of course, you need to understand Python.) The Code is as follows:
def quick_sort_2(sort_list): if len(sort_list)<=1: return sort_list return quick_sort_2([lt for lt in sort_list[1:] if lt<sort_list[0]]) + \ sort_list[0:1] + \ quick_sort_2([ge for ge in sort_list[1:] if ge>=sort_list[0]])
The code is from Python cookbook version 2, which shows the powerful expression of list derivation.
We know that we can plot all possible situations into a binary tree (decision tree model). For a list of n lengths, the height of the decision tree is h, the leaf node is all possible for this list to be out of order n !, We know that the leaf node of this binary tree will not exceed 2 ^ h, so there are 2 ^ h> = n !, Take the logarithm, you can know, h> = logn !, This is similar to O (nlogn ). That is to say, the best performance for comparing sorting algorithms is O (nlgn ).
Is there any linear time, that is, an algorithm with an O (n) time complexity? The answer is yes. However, in actual application, sorting algorithms are very complex. Here we only discuss Linear sorting algorithms in some special cases. In special cases, linear sorting algorithms mainly include count sorting, bucket sorting, and base sorting. Here we will simply talk about counting sorting.
Counting sorting is based on the assumption that the columns to be sorted are all positive integers. First, declare a new sequence list2. The sequence length is the maximum number of columns to be sorted. Traverse the columns to be sorted and set the size of each number to I and list2 [I] ++, which is equivalent to the number of times that the Count size is I. Then, apply for a list with the length equal to the length of the column to be sorted (this is the output sequence, it can be seen that counting sorting is not a local Sorting Algorithm ), traverse the columns to be sorted in reverse order (the reason is to maintain the stability of the sorting, and the two numbers with the same size are not changed after the sorting). If the current number is I, list [list2 [I]-1] = I, and list2 [I] auto-minus 1 (this is because the number of this size has been output, so the size must be automatically reduced ). The source code for counting and sorting is as follows.
class counting_sort(object): def _counting_sort(self, alist, k): alist3 = [0 for i in range(k)] alist2 = [0 for i in range(len(alist))] for j in alist: alist3[j] += 1 for i in range(1, k): alist3[i] = alist3[i-1] + alist3[i] for l in alist[::-1]: alist2[alist3[l]-1] = l alist3[l] -= 1 return alist2 def __call__(self, sort_list, k=None): if k is None: import heapq k = heapq.nlargest(1, sort_list)[0] + 1 return self._counting_sort(sort_list, k)
After introducing various sorting algorithms (the above Code has passed the unit test I wrote), let's go back to the Python topic. In fact, Python has replaced the built-in Sorting Algorithm many times since the earliest version. Starting from using the qsort routine provided by library C (this method has many problems), and then implementing its own algorithm, this includes a mixture of sampling sorting and semi-insertion sorting earlier than Version 2.3, as well as the latest adaptive sorting algorithm. The Code is also from line 800 to line 1200 of C language, so that more. From this we can know that in the actual production environment, the use of classic sorting algorithms is impractical, they can only be used for learning and research. In practice, the recommended practice should follow the following two points:
When sorting is required, try to use the sort method with built-in Python list.
Try to use the built-in dictionary when you need to search.
I wrote a test function to compare the advantages of the built-in sort method over the above method. The test sequence length is 5000. If you take the average value for each function test three times, you can obtain the following test results:
It can be seen that Python built-in functions have great advantages. Therefore, we should try to use the built-in sort method in actual applications.
As a result, another problem is raised. How to determine whether there are repeated elements in a sequence? If yes, True is returned, and False is not returned. Some people may say that this is not very simple. Simply write two nested iterations and traverse them. This should be the case when writing down the code.
def normal_find_same(alist): length = len(alist) for i in range(length): for j in range(i+1, length): if alist[i] == alist[j]: return True return False
The cost of this method is very large (the average time complexity is O (n ^ 2), and the worst case will be achieved when there are no repeated elements in the list), from the previous experience, we can think of using the extremely fast experience of the built-in sort method, we can do this: first sort the list, and then traverse it to see if there are repeated elements. The complete test code is as follows:
import timeimport randomdef record_time(func, alist): start = time.time() func(alist) end = time.time() return end - startdef quick_find_same(alist): alist.sort() length = len(alist) for i in range(length-1): if alist[i] == alist[i+1]: return True return Falseif __name__ == "__main__": methods = (normal_find_same, quick_find_same) alist = range(5000) random.shuffle(alist) for m in methods: print 'The method %s spends %s' % (m.__name__, record_time(m, alist))
After running, my data is. For a list of 5000-length elements without repeated elements, it takes about 1.205 seconds for a common method, while the quick search method takes only 0.003 seconds. This is an example of sorting in practical applications.
Articles you may be interested in:
- Python simple implementation of Bubble Sorting
- Python insertion Sorting Algorithm instance analysis
- Python Sorting Algorithm instance selection Summary