A list of Python internal implementations is an array, which is a linear table. Finding elements in a list can use the List.index () method, which has a time complexity of O (n). For large data volumes, you can use binary lookup for optimization. The binary lookup requires that the object must be ordered, with the following basic principles:
1. Starting from the middle element of the array, the search process ends if the intermediate element is exactly the element to be found;
2. If a particular element is greater than or less than the middle element, it is found in the half of the array greater than or less than the middle element, and is compared with the beginning from the middle element.
3. If an array of steps is empty, the representation cannot be found.
Binary lookup also becomes binary lookup, where each comparison reduces the search scope by half, and its time complexity is O (LOGN).
We use recursion and loop respectively to achieve binary search:
def binary_search_recursion (LST, value, Low, high): IF-< low: return -None mid = (low + high)/2 I F Lst[mid] > Value: return binary_search_recursion (LST, value, Low, mid-1) elif Lst[mid] < value: Return Binary_search_recursion (LST, value, mid+1, high) else: return mid def binary_search_loop (LST, Value): Low , high = 0, Len (LST)-1 when Low <= High: mid = (low + high)/2 if LST[MID] < value: Low = mid + 1 elif Lst[mid] > value: High = mid-1 Else: return mid return None
Next, perform a performance test on both of these implementations:
if __name__ = = "__main__": import random lst = [Random.randint (0, 10000) for _ in Xrange (100000)] Lst.sort () C3/>def test_recursion (): binary_search_recursion (LST, 999, 0, Len (LST)-1) def test_loop (): Binary_ Search_loop (LST, 999) import timeit t1 = Timeit. Timer ("Test_recursion ()", setup= "from __main__ import test_recursion") t2 = Timeit. Timer ("Test_loop ()", setup= "from __main__ import Test_loop") print "recursion:", T1.timeit () print "Loop:", T2.timeit ()
The results of the implementation are as follows:
recursion:3.12596702576loop:2.08254289627
It can be seen that the cyclic mode is more efficient than recursion.
Python has a bisect module that maintains an ordered list. The Bisect module implements an algorithm for inserting elements into an ordered list. In some cases, this is more efficient than repeating a list or constructing a large list to reorder. Bisect is the meaning of the dichotomy, which is sorted using dichotomy, which inserts an element into the appropriate position of an ordered list, which makes it unnecessary to maintain an ordered list every time the sort is called.
Here is a simple example of using:
Import Bisectimport random random.seed (1) print ' New Pos Contents ' print '--- -----------' l = []for i in range (1, 1 5): r = Random.randint (1, +) position = Bisect.bisect (l, R) Bisect.insort (L, R) print '%3d %3d '% (r, Position), l
Output Result:
New Pos Contents--- ----------- 0 [+] 1 [[], 1 [ 14, 26, 77, 85 ] 2 [+, +, 4, A, 14], 2 [+ ,------------------------] --------------------0,-----------------------0, 10, 14, 26, 45 9) 9 [3, ten, +, +,,,, 3, 4 [10, 14, 26, 44, 45, 50, 66, 77, 79, 84 ] 77< c22/>9 [3, ten, 1, 3, 10, 14, 26, 44, 45, 50, 0,,, +, 1, 66, +------ , 85]
The functions provided by the Bisect module are:
Bisect.bisect_left (A,x, Lo=0, Hi=len (a)):
Finds the index of the x that is inserted in the ordered list A. Lo and hi are used to specify the interval for the list, and the entire list is used by default. If x already exists, insert it to the left. The return value is index.
Bisect.bisect_right (A,x, Lo=0, Hi=len (a))
Bisect.bisect (A, x,lo=0, Hi=len (a)):
These 2 functions are similar to Bisect_left, but if x already exists, it is inserted to the right.
Bisect.insort_left (A,x, Lo=0, Hi=len (a)):
Insert x in ordered list A. and A.insert (Bisect.bisect_left (a,x, lo, HI), x) have the same effect.
Bisect.insort_right (A,x, Lo=0, Hi=len (a))
Bisect.insort (A, x,lo=0, Hi=len (a)):
Similar to Insort_left, but if x already exists, insert it to the right.
The functions provided by the Bisect module can be divided into two categories: The bisect* is used only to find index, not the actual insert, and insort* is used for actual insertion. The typical application of this module is to calculate the fractional grade:
def grade (score,breakpoints=[60, +, (), grades= ' FDCBA '): i = Bisect.bisect (breakpoints, score) return Grades[i] Print [Grade (score) for score in [33, 99, 77, 70, 89, 90, 100]]
Execution Result:
[' F ', ' A ', ' C ', ' C ', ' B ', ' a ', ' a ']
Similarly, we can use the Bisect module to achieve two-point search:
def binary_search_bisect (LST, x): From bisect import bisect_left i = bisect_left (LST, x) if I! = Len (LST) and Lst[i] = = x: return i return None
Let's test it again. Performance of binary lookups with recursion and loop implementations:
recursion:4.00940990448loop:2.6583480835bisect:1.74922895432
You can see that it is slightly faster than the loop implementation, almost half as fast as the recursive implementation.
Python's famous data processing library NumPy also has a function numpy.searchsorted for binary lookup, which is basically the same as bisect, except that if you want to insert the right side, you need to set the parameter side= ' rights ', for example:
>>> import NumPy as np>>> from bisect import bisect_left, bisect_right>>> data = [2, 4, 7, 9]&G T;>> bisect_left (data, 4) 1>>> np.searchsorted (data, 4) 1>>> bisect_right (data, 4) 2>> > np.searchsorted (Data, 4, side= ' right ') 2
So, let's compare the performance:
In []:%timeit-n-bisect_left (data, 99999), loops, best of 3:670 ns per loop in [+]:%timeit-n np.searchsor Ted (data, 99999) loops, best of 3:56.9 ms Per loop in []:%timeit-n bisect_left (data, 8888) loops, Best of 3 : 961 ns per loop in [max]:%timeit-n np.searchsorted (data, 8888) loops, best of 3:57.6 ms Per loop in []:%time It-n bisect_left (data, 777777) loops, Best of 3:670 ns per loop in [+]:%timeit-n np.searchsorted (data, 777 777) loops, best of 3:58.4 ms Per loop
It can be found that numpy.searchsorted efficiency is very low, and bisect is not at all an order of magnitude. So searchsorted is not suitable for searching for normal arrays, but it is quite fast to search for Numpy.ndarray:
in [+]: Data_ndarray = Np.arange (0, 1000000) in []:%timeit np.searchsorted (Data_ndarray, 99999) The slowest run took 16 . longer than the fastest. This could mean a intermediate result is a being cached.1000000 loops, best of 3:996 ns per loop in [+]:%timeit NP. Searchsorted (Data_ndarray, 8888) the slowest run took 18.22 times longer than the fastest. This could mean a intermediate result is a being cached.1000000 loops, best of 3:994 ns per loop in []:%timeit NP. Searchsorted (Data_ndarray, 777777) The slowest run took 31.32 times longer than the fastest. This could mean a intermediate result is being cached.1000000 loops, best of 3:990 ns per loop
Numpy.searchsorted can search for multiple values at the same time:
>>> np.searchsorted ([1,2,3,4,5], 3) 2>>> np.searchsorted ([1,2,3,4,5], 3, side= ' right ') 3>> > np.searchsorted ([1,2,3,4,5], [ -10, Ten, 2, 3]) array ([0, 5, 1, 2])