15. ordered table search and clue index search, 15 clue Indexes
Reprint please indicate the source: http://blog.csdn.net/u012637501
1. Search an ordered table1. Half-fold search/Binary Search Algorithm(1) Basic Ideas: In an ordered table with sequential storage, take the intermediate record (a [mid] = key) as the comparison object. If the given value is the same as the keyword of the intermediate record, the query is successful; if the given value is less than the key word of the intermediate record, the search will continue in the left half area of the intermediate record; if the given value is greater than the key word of the intermediate record, the right half side of the intermediate record. Repeat the above process until the search is successful, or no records are found in all the search areas.(2) conditions of useRecords in a linear table are ordered by key codes (generally ascending order). A linear table must be stored in sequence, that is, ordered tables.(3) Algorithm Implementation
/* Semi-query * a is a linear table array, n is the total number of data elements, and key is the key value to be searched */int Binary_Search (int * a, int n, int key) {int low, high, mid; low = 1; // defines the lowest subscript as the first high = n; while (low <= high) {mid = (low + high) /2; // half (rounded) if (key <a [mid]) // if the search value is smaller than the value in the middle, high = mid-1; // adjust the maximum subscript to the median Yiwei else if (key> a [mid]) // if the search value is greater than the median low = mid + 1; // adjust the lowest subscript to the first else return mid; // if the lowest subscript is equal, the Mid is the position found, and the return address} return 0 ;}
(4) ExampleSuppose there is an ordered table a = {,}, n = 10, key = 62. That is, a total of 10 digits except 0 subscripts are used to check whether there is a number 62. A. search Steps: Step 1 :. a [mid] = a [5] = 47 <key, then low = mid + 1 = 6. At this time, mid = (6 + 10) = 8
Step 2: a [mid] = a [8] = 73> key, then high = mid-1 = 7. At this time, mid = (6 + 7)/2 = 6
Step 3: a [mid] = a [6] = 59 <key, then low = mid + 1 = 7. At this time, mid = (low + high) = (7 + 7)/2 = 7
Step 4: a [mid] = a [7] = key. The search is successful. B. in the time complexity analysis, we will plot the search process of this array into a binary tree as follows: by the nature of the Binary Tree 4: "the depth of a Complete Binary Tree with n nodes is [log2 ^ n] + 1 ", in the worst case, we can find the keyword or the number of failed searches is [log2 ^ n] + 1, preferably 1. Compared with the time complexity O (n) of sequential search, the time complexity of the half-fold algorithm is O (logn), which is obviously much more efficient.
2. Interpolation Search AlgorithmIn the query of ordered tables, although the half-fold query is much more efficient than the query of ordered queries, there are still some limitations in the half-fold query. For example, the value range is 0 ~ Search for 5 of the data with 100000 elements evenly distributed from small to large. We will naturally consider starting from a small array subscript, rather than starting from the middle. (1) interpolation lookup method interpolation lookup is an improvement in the half-fold lookup algorithm, which is another sort of ordered table lookup algorithm. Interpolation Search is a Search method after comparing the key of the keyword to be searched with the keyword of the maximum and minimum records in the Search table, its core lies in the interpolation formula (key-a [low])/(a [high]-a [low]), that is, mid = low + (key-a [low])/(a [high]-a [low]). (2) Applicability: an ordered table with a large table length and even keyword distribution. (3) time complexity: O (logn) (4) Example: in the preceding example, a = {,}. We want to query the keyword 16. If you use the half-fold Lookup (mid = (low + high)/2), you need to find it four times. If you use interpolation Lookup (mid = low + (key-a [low]) /(a [high]-a [low]) ~ = 2), that is, only two times.
3. Fibonacci search(1) the Fibonacci series describes the breeding problem of rabbits. This series has an obvious characteristic: the sum of the two adjacent items above constitutes the next item. The mathematical model is: | 0. When n = 0, where n is the number of months F (n) = | 1, when n = 1 | F (n-1) + F (n-2) WHEN n> 1. Where n is the number of months that have passed, and F (n) is the number of rabbits in the nth month.
(2) Algorithm Implementation
Int Fibonacci_Search (int * a, int n, int key) {int low, high, mid, I, k; low = 1; // define the lowest subscript as the first high = n; // define the highest subscript as the minimum k = 0; while (n> F [k]-1) k ++; for (I = n; I <F [k]-1; I ++) // Add the less than a [I] = a [n]; while (low <= high) {mid = low + F [k-1]-1; if (key <a [mid]) {high = mid-1; k = K-1 ;} else if (key> a [mid]) // if the query record is greater than the current split record {low = mid + 1; // The lowest subscript is adjusted to the separator subscript mid + 1 k = K-2; // the Fibonacci series subscript minus two} else {if (mid <= n) return mid; // if they are equal, the mid is the else return n; // If the mid> n is the completion value, the return value is n} return 0 ;}
2. Linear index searchThe final purpose of the data structure is to increase the data processing speed, and the index is a data structure designed to speed up the search speed. An index is the process of associating a keyword with its corresponding record. An Index consists of several index items, each index contains at least the location of the keyword and its corresponding record in the memory. Indexing is an important technology for organizing large databases and disk files. Indexes can be divided into linear indexes, tree indexes, and multi-level indexes by structure. The linear index refers to the linear structure of the index item set and the index table.
1. Dense IndexA dense index refers to a linear index that corresponds to each record in a dataset. For the dense index clue table, index items must be sorted according to key codes. Note: The index table is ordered, that is, when we want to search for keywords, we can use ordered search algorithms such as semi-fold, interpolation, and Fibonacci, which greatly improves the efficiency. For example, if the keyword to be searched is a record of 18 and from the data table on the right, it can only be searched in sequence (6 times required); from the left index table, then you can get the pointer of 18 by performing a two-fold half-lookup. It can be seen that the purpose of linear index search is to change data from unordered to ordered, and then use an ordered search algorithm to provide efficiency.
2. multipart IndexBecause the number of index items is the same as the number of records in the dataset, the space for a dense index is very expensive. To reduce the number of index items, we can partition the data set to make it segmented and ordered, and then create an index item for each piece to reduce the number of index items. (1) The principle of Segmented Order is segmented and ordered, that is, the records of a dataset can be divided into several blocks, and these fast requirements must meet two conditions:★The records in each block are not ordered;★Blocks are ordered. For example, the keywords of all records in the second block must be greater than those of all records in the first block ......; (2) segmented index: For segmented ordered data sets, each part corresponds to an index. This index method is called a segmented index. The index item structure of the segmented index is divided into three data items:★Maximum key code: used to store the maximum keywords in the corresponding block, so that the minimum keywords in the next block after it can be larger than the maximum keywords;★Block length: stores the number of records in the block for use in cycles;★Block first pointer: This pointer is used to point to the block first data element, so that you can easily traverse this record.
(3) The first step in the search process of the block index table: Find the block where the keyword to be queried is located in the block index table; Step 2: Find the corresponding block based on the first pointer of the block, the key codes are sequentially searched in the block (the records in the block are generally unordered ). (4) The average search length of the segmented index data set with n records is evenly divided into m blocks, each with t records, obviously n = m * t, or m = n/t. According to the Equi-probability principle of the worst possible domain: Lb is the average search length of the index table = (m + 1)/2; the Lw is the average query length of a record = (t + 1)/2. It can be seen that the average query length of a multipart index query is: ASLw = Lb + Lw = (m + 1)/2 + (t + 1)/2 = (m + t)/2 + 1 = (n/t + t) /2 + 1. that is, the average length is not only dependent on the total number of records n of the dataset, but also related to the number of records t of each piece. N = m * t, ASLw = [(n/t + t)/2 + 1]> = [(n + t ^ 2) /t]/2 + 1 = [(m * t + t * t)/t]/2 + 1, False Input m = t-> t = √ n, that is, the number of parts m is the same as the number of records t in the block. When ASLw = √ n + 1, it is the best case. O (n) <O (√ n) <O (logn ).
3. inverted indexThe inverted index is derived from the records that need to be searched based on the value of the attribute (or field or secondary key code) in actual applications.