[Math] beating the binary search algorithm-Interpolation Search, galloping search

Source: Internet
Author: User

From: http://blog.jobbole.com/73517/

 

Binary Search is one of the simplest but most effective algorithms for searching ordered arrays. The problem is,Can more complex algorithms be used better?Let's take a look at other methods.

In some cases, it is not feasible to hash the entire dataset, or to query both the location and the data itself. At this time, the O (1) running time cannot be implemented by using a hash table. However, for ordered arrays, division and control can usually achieve the worst running time of O (log (n.

Before the conclusion, it is worth noting that an algorithm can be "defeated" in many ways: the required space, the required running time, and the access needs for the underlying data structure. Next, we will conduct a runtime comparison experiment. In this experiment, we will create multiple random arrays with a number of elements ranging from 10,000 to 81,920,000. All elements are 4-byte integer data.

 

Binary Search

In each step of the binary search algorithm, the search space is always halved, so the running time is guaranteed. Finding a specific element in the array ensures that it is completed in O (log (N) time, and it is faster if it is just an intermediate element. That is to say, to find the position of an element from the array of 81,920,000 elements, only 27 or fewer iterations are required.

Because of the random jump of binary search, this algorithm is not cache-friendly. As long as the search space is smaller than the specified value (64 or less ), some fine-tuned binary search algorithms will switch back to linear search to continue searching. However, this final space value is extremely architecture-related, so most frameworks do not do this optimization.

 

Quick search; returns to binary search for quick search.

If the length of the array is unknown for some reason, quick search can identify the initial search domain. This algorithm starts from the first element and doubles the upper bound of the search domain until the upper bound is greater than the keyword to be queried.

Then, depending on the implementation,

  • Or use a standard binary search to ensureO (log (n ))Running time
  • Or start another round of quick search. CloserO (N).

If the element we are looking for is close to the beginning of the array, quick search is very effective.

 

Sample search

Sample search is a bit like binary search, but before determining the main search area, it will take a few samples from the array. Finally, if the range is small enough, the standard binary search is used to determine the exact location of the element to be queried. This theory is interesting, but it does not work well in practice.

 

Interpolation Search; returns to the Interpolation Search for sequential search.

In the algorithm under test, interpolation search can be said to be the "Smartest" algorithm. It is similar to the method in which humans use the phone book. It tries to guess the position of an element by assuming that the elements are evenly distributed in the array.

First, it samples and selects the start and end of the search space, and then guesses the position of the element. The algorithm repeats this step until it finds the element.

  • If the prediction is accurate, the comparison times are O (log (N) and the running time is O (log (n ));
  • However, if the guess is incorrect, the running time will be O (n.

An original version of Interpolation Search starts sequential search as long as we can speculate that the element location is close to the final location. Compared with binary search, each iteration of Interpolation Search has a high cost. Therefore, we use sequential search in the last step, without the need to guess the complex computing of element locations, it is easy to find the final element location from a small area (about 10 elements.

A major question about Interpolation Search is that the comparison times of O (log (N) may generate the running time of O (log (n. This is not a case, because there is a trade-off between the storage access time and the CPU time calculated for the next guess. If the data volume is large and the storage access time is significant, for example, on an actual hard disk, Interpolation Search can easily beat binary search. However, experiments show that if the access time is short, such as Ram, Interpolation Search may not produce any benefits.

 

Test Results

The source code in the experiment is written in Java. Each experiment runs 10 times on the same array. The array is a random integer array and is stored in the memory.

In Interpolation Search, samples are used to retrieve 20 samples from the search space to determine the next search domain. If the field is assumed to have only 10 or fewer elements, linear search is used. In addition, if the number of elements in this search field is less than 2000, it will be rolled back to the standard binary search.

For reference, the default Java arrays. binarysearch algorithm is also added to the experiment to compare the running time with the custom algorithm.

 

Average search time/element, given the array size

 

 

Average comparisons/search, given the array size

Despite our high expectation for interpolation, its actual running time does not beat Java's default binary search algorithm. If the storage access time is long, combining some types of Hash trees and B + trees may be a better choice. However, it is worth noting that the combination of interpolation and sequential Retrieval for evenly distributed Arrays can always be better than binary retrieval in terms of the number of comparisons. However, the platform's binary search is very efficient, so in many cases, it may not need to be replaced by more complex algorithms.

Raw data-Average running time of each search

Size

Arrays.
Binarysearch

Interpolation
+ Seq

Interpolation

Sampling

Binary

Gallop

Gallop
+ Binary

10,000 1.50e-04 MS 1.60e-04 MS 2.50e-04 MS 3.20e-04 MS 5.00e-05 MS 1.50e-04 MS 1.00e-04 MS
20,000 5.00e-05 MS 5.50e-05 MS 1.05e-04 MS 2.35e-04 MS 7.00e-05 MS 1.15e-04 MS 6.50e-05 MS
40,000 4.75e-05 MS 5.00e-05 MS 9.00e-05 MS 1.30e-04 MS 5.25e-05 MS 1.33e-04 MS 8.75e-05 MS
80,000 4.88e-05 MS 5.88e-05 MS 9.88e-05 MS 1.95e-04 MS 6.38e-05 MS 1.53e-04 MS 9.00e-05 MS
160,000 5.25e-05 MS 5.94e-05 MS 1.01e-04 MS 2.53e-04 MS 6.56e-05 MS 1.81e-04 MS 9.38e-05 MS
320,000 5.16e-05 MS 6.13e-05 MS 1.22e-04 MS 2.19e-04 MS 6.31e-05 MS 2.45e-04 MS 1.04e-04 MS
640,000 5.30e-05 MS 6.06e-05 MS 9.61e-05 MS 2.12e-04 MS 7.27e-05 MS 2.31e-04 MS 1.16e-04 MS
1,280,000 5.39e-05 MS 6.06e-05 MS 9.72e-05 MS 2.59e-04 MS 7.52e-05 MS 2.72e-04 MS 1.18e-04 MS
2,560,000 5.53e-05 MS 6.40e-05 MS 1.11e-04 MS 2.57e-04 MS 7.37e-05 MS 2.75e-04 MS 1.05e-04 MS
5,120,000 5.53e-05 MS 6.30e-05 MS 1.26e-04 MS 2.69e-04 MS 7.66e-05 MS 3.32e-04 MS 1.18e-04 MS
10,240,000 5.66e-05 MS 6.59e-05 MS 1.22e-04 MS 2.92e-04 MS 8.07e-05 MS 4.27e-04 MS 1.42e-04 MS
20,480,000 5.95e-05 MS 6.54e-05 MS 1.18e-04 MS 3.50e-04 MS 8.31e-05 MS 4.88e-04 MS 1.49e-04 MS
40,960,000 5.87e-05 MS 6.58e-05 MS 1.15e-04 MS 3.76e-04 MS 8.59e-05 MS 5.72e-04 MS 1.75e-04 MS
81,920,000 6.75e-05 MS 6.83e-05 MS 1.04e-04 MS 3.86e-04 MS 8.66e-05 MS 6.89e-04 MS 2.15e-04 MS

Raw data-average comparison times of each search

Size

Arrays.
Binarysearch

Interpolation
+ Seq

Interpolation

Sampling

Binary

Gallop

Gallop
+ Binary

10,000 ? 10.6 17.6 19.0 12.2 58.2 13.2
20,000 ? 11.3 20.7 19.0 13.2 66.3 14.2
40,000 ? 11.0 16.9 20.9 14.2 74.9 15.2
80,000 ? 12.1 19.9 38.0 15.2 84.0 16.2
160,000 ? 11.7 18.3 38.0 16.2 93.6 17.2
320,000 ? 12.4 25.3 38.2 17.2 103.8 18.2
640,000 ? 12.4 19.0 41.6 18.2 114.4 19.2
1,280,000 ? 12.5 20.2 57.0 19.2 125.5 20.2
2,560,000 ? 12.8 22.7 57.0 20.2 137.1 21.2
5,120,000 ? 12.7 26.5 57.5 21.2 149.2 22.2
10,240,000 ? 13.2 25.2 62.1 22.2 161.8 23.2
20,480,000 ? 13.4 23.4 76.0 23.2 175.0 24.2
40,960,000 ? 13.4 21.9 76.1 24.2 188.6 25.2
81,920,000 ? 14.0 19.7 77.0 25.2 202.7 26.2

Source code

Click here to obtain the complete source code of the search algorithm. Note that the Code is not at the product level. For example, in some examples, there may be too many or too few range checks.

[Math] beating the binary search algorithm-Interpolation Search, galloping search

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.