Comparison of actual lookup performance for three different search algorithms

Source: Internet
Author: User

First, the introduction of the search problem

A lookup problem is finding a given value in a given set (or Multiset, which allows multiple elements to have the same value), which we call a lookup key. There are many search algorithms to choose from, including straightforward sequential searches, as well as highly efficient but restricted binary lookups, and algorithms that represent the original collection in another form for easy lookup. The last class of algorithms is of particular importance to real-world applications because they are essential for information access to large databases.

For a lookup, no algorithm is optimal in any case. Some algorithms are faster than other algorithms, but require more storage space; some algorithms are very fast, but only for ordered arrays, and so on. Unlike the sorting algorithm, the lookup algorithm has no stability problems, but other problems occur. Specifically, if the data in the application changes frequently relative to the number of lookups, finding the problem must be considered in conjunction with the other two operations: adding and removing elements from the data collection. In this case, the data structures and algorithms must be chosen in order to achieve a balance between the requirements of the various operations. Furthermore, how to organize the structure of large data sets for efficient lookups is an unusual challenge, which is of great significance for practical applications.

There are three ways to find the problem: Linear table Direct lookup, binary sort tree lookup, balanced binary sorting tree to find three ways.

1) Where the linear table (Linear list,ll) is searched directly, by storing the data collection in an array, searching the end of the array every time from the first address of the array until the unknown origin element is found and the end of the array is reached. An optimized algorithm is given for this method, and the elements in the array are added to a frequency domain, the elements with the highest frequency are placed at the front of the arrays, and sequentially sorted by the elements in the frequency pairs.

2) Binary sort tree (binary sort tree,bst) lookup, BST is also called the binary search tree, which is defined as: two fork sort tree or an empty tree, or a two-fork tree with the following properties: ① If the left subtree is not empty, then the value of all nodes on the left subtree is less than the value of its root node; ② If the right subtree is not empty , the value of all nodes on the right subtree is greater than the value of its root node, and the ③ left and right subtrees are also two-fork sort trees; ④ have no nodes with key values equal. From the definition of BST, the process of finding an element in BST, that is, starting from the root node, each time depending on the unknown origin element and the root node comparison, the decision to find a successful or left or right subtree to find, until the unknown origin element or all to the leaf node.

3) Balance Binary search tree (adelson-velskii and Landis,avl tree), an AVL tree is a binary lookup tree, where the balance factor for each node is defined as the height difference of the right subtree of the node Saozi, which is either 0 or + 1 or-1 (the height of an empty tree is defined as-1). Therefore, the AVL tree is a special two-fork search tree with a balance of left and right branch depths, which shortens the height of the tree and reduces the number of lookups. The process of finding elements in the AVL tree species resembles finding elements in a binary sort tree.

Second, the theoretical analysis of the search problem

The number of data in the set is N, and when the linear table is stored directly, each time an element is found, in the best case, the time complexity is O (1), the worst case time complexity is O (n), so the average lookup length is


In the algorithm for direct lookup of linear table, the frequency domain is increased, and the data in the array is sorted from high to low, in the best case, the time complexity is O (1) and the worst case is O (n). The average lookup length of the data depends on the sequence you are looking for. But you can be sure that its lookup length is less than the length of the search before optimization


In the analysis of the theory of binary sorting tree to find performance, we first need to understand the binary sorting tree shape, because when the binary sorting tree shape, its tree depth will be different, so the performance of the search will be different. In extreme cases, the binary sort tree is degraded to a single branch tree, and its lookup performance is the same as the linear table lookup. Here we can analyze, for a randomly generated two-fork sorting tree, its average lookup performance is better than linear table direct lookup. To determine the upper and lower bounds of the two-fork sort Tree Lookup performance, we first analyze a fully balanced two-fork sort tree, which is both a balanced binary tree and a fully binary tree. Assuming that each node finds the same frequency, the average lookup length for a fully balanced binary lookup tree is


So the average length of a randomly generated binary search tree is


The above three methods from the time complexity, the time complexity of finding a data is O (n), space complexity requires O (n) size of secondary storage space.


Third, the experimental simulation results

Experiments on the VC6.0 platform, the use of C language program, the test data stored in the experiment in the Record.txt text, and then import the text data into MATLAB for visual comparison.

Figure 3.1 and Figure 3.2 are real-time outputs that test the time spent running different lookup algorithms. In this case, figure 3.1 is a small number of samples of the running output diagram, Figure 3.2 is a large number of samples running output diagram. Each output will continuously output four sets of data, each set of data contains two values, the first value represents the total time spent looking for, the second value is the total lookup length in the lookup.


And records the running data in the file, both figures 3.3 and 3.4 are the evaluation values of each algorithm finding performance when the number of test samples is different. Each behavior of a sample of records, contains nine numbers, the first value is the number of test samples, the following eight digits, two a group, a total of four groups, the first number is the time spent searching, the second number is the total length of the search, the four groups of numbers corresponding, directly linked table lookup, with the frequency of the linked table lookup, Binary lookup tree and AVL tree lookup. In Figure 3.4, some of the numbers are negative because the length value of the lookup exceeds the maximum numeric representation range of type int in C, resulting in an overflow.


In Figure 3.5, there is a time-consuming comparison of four algorithms, which is generated by importing the experimental data from above into MATLAB. We can intuitively see the four algorithms, the linear table directly to find the longest time, with frequency of linear denaturation second, the relative ratio, binary search tree and AVL tree lookup performance is much better, the time spent is significantly lower than the linear table. As the number of samples increases, the time of the linear table lookup takes a positive proportion of the overall trend, while the binary sort tree and the two-fork lookup tree change slowly.


Figure 3.6 and Figure 3.7 are the graphs of the local amplification of Figure 3.5. From these two graphs, we can see the performance comparison of two kinds of linear table and two kinds of two-tree data structure algorithm intuitively.


Below, we will present a performance comparison graph of the total lookup lengths of each algorithm. In Figure 3.8, we can see that the total lookup length of the binary lookup tree and the AVL tree is much lower relative to the direct linear table and the linear table with frequency. When the figure is close to 900000 samples, the lookup length becomes negative because there is a forward overflow of the int type variable that stores the lookup length.


Figure 3.9, respectively, for the local amplification of Figure 3.8, a clearer representation of its performance comparison. As can be seen intuitively in Figure 3.8, linear tables with frequency can improve the search performance and reduce the total lookup length based on the direct lookup of linear tables. As you can see from figure 3.9, the search performance of the AVL tree is in the average case, and its lookup is spent on the total lookup length, about 3/4 of the two-fork lookup tree.





Comparison of actual lookup performance for three different search algorithms

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.