Python binary search and bisect module

Source: Internet
Author: User
In Python, the list is an array, that is, a linear table. You can use the list. index () method to search for elements in the list. The time complexity is O (n ). For large data volumes, you can use binary search for optimization. Binary search requires that the object be ordered. The basic principle is as follows: the list of Python is implemented as an array, that is, a linear table. You can use the list. index () method to search for elements in the list. The time complexity is O (n ). For large data volumes, you can use binary search for optimization. Binary search requires that the object be ordered. The basic principle is as follows:

1. starting from the intermediate element of the array, if the intermediate element is exactly the element to be searched, the search process ends;

2. if a specific element is greater than or less than the intermediate element, search for it in the half of the array that is greater than or less than the intermediate element, and compare it from the intermediate element as before.

3. if the array in a step is empty, it indicates that it cannot be found.

Binary search also becomes a half-fold search. Each comparison of algorithms reduces the search range by half, and the time complexity is O (logn ).

We use recursion and loop to achieve binary search respectively:

def binary_search_recursion(lst, value, low, high):      if high < low:          return None    mid = (low + high) / 2      if lst[mid] > value:          return binary_search_recursion(lst, value, low, mid-1)      elif lst[mid] < value:          return binary_search_recursion(lst, value, mid+1, high)      else:          return mid   def binary_search_loop(lst,value):      low, high = 0, len(lst)-1      while low <= high:          mid = (low + high) / 2          if lst[mid] < value:              low = mid + 1          elif lst[mid] > value:              high = mid - 1        else:            return mid      return None

Perform the following performance tests on the two implementations:

if __name__ == "__main__":    import random    lst = [random.randint(0, 10000) for _ in xrange(100000)]    lst.sort()     def test_recursion():        binary_search_recursion(lst, 999, 0, len(lst)-1)     def test_loop():        binary_search_loop(lst, 999)     import timeit    t1 = timeit.Timer("test_recursion()", setup="from __main__ import test_recursion")    t2 = timeit.Timer("test_loop()", setup="from __main__ import test_loop")     print "Recursion:", t1.timeit()    print "Loop:", t2.timeit()

The execution result is as follows:

Recursion: 3.12596702576Loop: 2.08254289627

It can be seen that the loop method is more efficient than recursion.

Python has a bisect module for maintaining an ordered list. The bisect module implements an algorithm for inserting elements into an ordered list. In some cases, this is more efficient than repeatedly sorting a list or constructing a large list and then sorting. Bisect refers to the meaning of the binary method. here we use the binary method to sort, which inserts an element into an appropriate position in an ordered list, so that the ordered list does not need to be maintained every time we call sort.

The following is a simple example:

import bisectimport random random.seed(1) print'New  Pos Contents'print'---  --- --------' l = []for i in range(1, 15):    r = random.randint(1, 100)    position = bisect.bisect(l, r)    bisect.insort(l, r)    print'%3d  %3d' % (r, position), l

Output result:

New  Pos Contents---  --- -------- 14    0 [14] 85    1 [14, 85] 77    1 [14, 77, 85] 26    1 [14, 26, 77, 85] 50    2 [14, 26, 50, 77, 85] 45    2 [14, 26, 45, 50, 77, 85] 66    4 [14, 26, 45, 50, 66, 77, 85] 79    6 [14, 26, 45, 50, 66, 77, 79, 85] 10    0 [10, 14, 26, 45, 50, 66, 77, 79, 85]  3    0 [3, 10, 14, 26, 45, 50, 66, 77, 79, 85] 84    9 [3, 10, 14, 26, 45, 50, 66, 77, 79, 84, 85] 44    4 [3, 10, 14, 26, 44, 45, 50, 66, 77, 79, 84, 85] 77    9 [3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85]  1    0 [1, 3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85]

The Bisect module provides the following functions:

Bisect. bisect_left (a, x, lo = 0, hi = len ()):

Search for the index of x inserted in ordered list. Lo and hi are used to specify the range of the list. The entire list is used by default. If x already exists, insert it on the left. The return value is index.

Bisect. bisect_right (a, x, lo = 0, hi = len ())

Bisect. bisect (a, x, lo = 0, hi = len ()):

These two functions are similar to bisect_left, but if x already exists, insert it on the right.

Bisect. insort_left (a, x, lo = 0, hi = len ()):

Insert x into ordered list. The same effect as a. insert (bisect. bisect_left (a, x, lo, hi), x.

Bisect. insort_right (a, x, lo = 0, hi = len ())

Bisect. insort (a, x, lo = 0, hi = len ()):

Similar to insort_left, but if x already exists, insert it on the right.

The functions provided by the Bisect module can be divided into two types: bisect * is only used to find the index, without actual insertion, and insort * is used for actual insertion. This module is typically used to calculate the score level:

def grade(score,breakpoints=[60, 70, 80, 90], grades='FDCBA'):    i = bisect.bisect(breakpoints, score)    return grades[i] print [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]

Execution result:

['F', 'A', 'C', 'C', 'B', 'A', 'A']

Likewise, we can use the bisect module to implement binary search:

def binary_search_bisect(lst, x):    from bisect import bisect_left    i = bisect_left(lst, x)    if i != len(lst) and lst[i] == x:        return i    return None

Let's test its performance in binary search with recursion and loop:

Recursion: 4.00940990448Loop: 2.6583480835Bisect: 1.74922895432

We can see that it is slightly faster than loop implementation, and half faster than recursive implementation.

The famous Python data processing database numpy also has a function numpy for binary search. searchsorted is used basically the same as bisect. However, if you want to insert data on the right side, you must set the side = 'right' parameter. for example:

>>> import numpy as np>>> from bisect import bisect_left, bisect_right>>> data = [2, 4, 7, 9]>>> bisect_left(data, 4)1>>> np.searchsorted(data, 4)1>>> bisect_right(data, 4)2>>> np.searchsorted(data, 4, side='right')2

Then, let's compare the performance:

In [20]: %timeit -n 100 bisect_left(data, 99999)100 loops, best of 3: 670 ns per loop In [21]: %timeit -n 100 np.searchsorted(data, 99999)100 loops, best of 3: 56.9 ms per loop In [22]: %timeit -n 100 bisect_left(data, 8888)100 loops, best of 3: 961 ns per loop In [23]: %timeit -n 100 np.searchsorted(data, 8888)100 loops, best of 3: 57.6 ms per loop In [24]: %timeit -n 100 bisect_left(data, 777777)100 loops, best of 3: 670 ns per loop In [25]: %timeit -n 100 np.searchsorted(data, 777777)100 loops, best of 3: 58.4 ms per loop

It can be found that numpy. searchsorted has a low efficiency, which is not an order of magnitude higher than bisect. Therefore, searchsorted is not suitable for searching ordinary arrays, but it is quite fast to search numpy. ndarray:

In [30]: data_ndarray = np.arange(0, 1000000) In [31]: %timeit np.searchsorted(data_ndarray, 99999)The slowest run took 16.04 times longer than the fastest. This could mean that an intermediate result is being cached.1000000 loops, best of 3: 996 ns per loop In [32]: %timeit np.searchsorted(data_ndarray, 8888)The slowest run took 18.22 times longer than the fastest. This could mean that an intermediate result is being cached.1000000 loops, best of 3: 994 ns per loop In [33]: %timeit np.searchsorted(data_ndarray, 777777)The slowest run took 31.32 times longer than the fastest. This could mean that an intermediate result is being cached.1000000 loops, best of 3: 990 ns per loop

Numpy. searchsorted can search multiple values at the same time:

>>> np.searchsorted([1,2,3,4,5], 3)2>>> np.searchsorted([1,2,3,4,5], 3, side='right')3>>> np.searchsorted([1,2,3,4,5], [-10, 10, 2, 3])array([0, 5, 1, 2])

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.