Base algorithm-Find: Linear index Lookup

Last Update:2015-05-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Several of the algorithms described above are based on data order. But in the actual application, many data sets may have unexpectedly the amount of data, in the face of these massive data, to ensure that all the records in accordance with one of the keywords in order, the time cost is very expensive, so this data is usually in order to be stored sequentially.

So how can you quickly find the data you need? The way is--index.

An index is the process of associating a keyword with its corresponding record. An index consists of several index entries, each of which should include, at a minimum, information such as the keywords and the corresponding records in memory .

Indexes can be divided into linear indexes, tree indexes, and multilevel indexes by structure. The so-called linear index is to organize the index item collection into a linear structure, also known as an index table.

Dense index

Dense indexes are those in a linear index table that correspond to one index entry for each record in the dataset. And the index entries must be ordered according to the key code order.

Index entry order also means that when looking for keywords, you can use the binary, interpolation, Fibonacci and other ordered search algorithm.

The improvement of the dense index is that it simplifies the large original data set, makes the large data set that cannot be loaded into memory, can load the memory at once, and can implement the sort of key loadline in memory, and each index entry can point to the original data record that it represents on the disk.

The ability to take advantage of advanced lookup algorithms, which is obviously the advantage of dense indexes, but if the data set is very large, then the index table is very large, for the memory of a limited computer, have to put the index table on disk, which greatly reduces the efficiency.

Chunking Index

Dense indexes because the index entries are the same as the number of records in the dataset, the space cost is significant. To reduce the number of index entries, block the dataset so that it blocks in order, and then create an index entry for each block, reducing the number of indexed items.

The block order is to divide the data set into several blocks, which are required to satisfy the following conditions:

The blocks are unordered, and the blocks are ordered in between.

The structure of the indexed items defined in is divided into three data items:

(1) The maximum key, which stores the largest keyword in each block, has the advantage of making the smallest keyword in the next chunk behind it larger than the largest keyword.

(2) stores the number of records in the block for use in loops.

(3) A pointer to the first data element of the block to facilitate the beginning of the record traversal in this piece

This can be done well, large chunks of data are stored on disk, and index tables are stored in memory. This model does not require a sort operation on the original dataset, because the blocks and blocks can be stored discontinuous. Determine the number of blocks before the original data is generated, and where each block is stored (the block is not contiguous, the block is contiguous), then the range of stored data within each block is determined, and when new data arrives, it is possible to determine which block to put the data in.

As an example:

I want to design a sub-index to find the data, roughly estimated to have 3,600 data, so according to the algorithm is optimal (for the moment, think about it), set 60 blocks, each block has 60 records. The 60 blocks correspond to the 60 folder directories on the disk to store the data, and the blocks between the 60 blocks are not contiguous in each other. Assuming that the keyword size range for these 3,600 records is 1-3000, the first block stores 1-50 of the records. To a new record, if the keyword is between 1-50, append it directly to the first block. Also, if the key value of this record is greater than the maximum key in the index table, the maximum critical loadline in the index table is updated.

Analysis of average lookup length for block index Table

There are N records, divided into m blocks, each with a T-bar record. Apparently N=mxt. The average lookup lengths in index tables and blocks are lb and LW, respectively.

In the above analysis, the order lookup is also used between blocks, because the blocks are ordered, so fast algorithms such as binary lookups can be used to improve efficiency.

http://blog.csdn.net/wtfmonking/article/details/17337703
http://blog.csdn.net/xiaoping8411/article/details/7706381

http://blog.csdn.net/xiaoping8411/article/details/7706376

http://blog.csdn.net/wangyunyun00/article/details/23464359

http://blog.csdn.net/fovwin/article/details/9077017

#define Ilmsize; #define MAXSIZE 3600;//Building an index entry structure struct Indexitem{int index;int start;int length;};/ /Build Index Table typedef struct INDEXITEM indexlist[ilmsize];//ilmsize is a predefined integer constant, greater than or equal to the number of index entries m//the primary table int Mainlist[maxsize] that created the original data. //maxsize is a predefined integer constant that is greater than or equal to the number of records in the primary table n/** input: Main Table A, Index Table B, Index Table index number m, element to search elem* output: Subscript of the found element */int Blocksearch (mainlist A , indexlist B, int m, int elem) {for (int i = 0; i < m; i++) {if (B[i].index >= elem) {break;}}  Forif (i = = m) {return-1;//lookup failed}int endnum = B[i].start + b[i].length;for (int j = B[i].start; j < num; J + +) {if (a[j] = = Elem) {break;}} Forif (J < num) {return j;} if (j = = num) {return-1;}}

Base algorithm-Find: Linear index Lookup

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Base algorithm-Find: Linear index Lookup

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Base algorithm-Find: Linear index Lookup

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support