Retrieving (hashing)

Source: Internet
Author: User
Search algorithm Category:Sequential table and Linear table method direct access method (hash list) tree indexing method based on key code value Sequential Retrieval

(Key values are sorted by size)
The most basic search algorithm, but the large record collection of repeated lookup, sequential search will be slow to unbearable. The commonly used index mode is the binary lookup self- organized linear table

Because linear tables are ordered in most cases by using key codes, however, this will slow down the search speed for certain situations, such as when a key code is present, but it is at the end of the linear table so that it can be accessed at the end of each time, so the self-organizing linear table emerges.

It is significantly better than the sorting algorithm where the linear table is not sorted, which means that the cost of inserting a new record is low, which compensates for the high cost of the search when records are frequently inserted. Self-organizing linear table is easier to implement than retrieval tree

Common 3 heuristic rules for managing self-organizing linear tables: Frequency calculation (class LFU): Use additional space to store the number of elements in a linear table to be accessed, and sort the linear table by the number of accesses, the disadvantage of which is the record of the most visited in a certain period of time, even if there is no subsequent access to the record, In the back for a long time at the front. Move to the front end (class LRU): Each time a record is accessed, if there is a record in the linear table, the record is referred to the front, the record in front of the position is moved back, and if there is no record, it is inserted to the front and the other position is moved back. Although this algorithm solves the problem of the retention of the frequency count, it also has obvious disadvantages, for example, when the transpose (transpose): To swap the found record with its previous record in the linear table. Records that have been accessed frequently but are no longer used will slowly fall behind. However, if alternate access adjacent to the end of the record, such as adjacent Records y, X at the end of the linear table, the access order is Xyxyxyxy, then the record x, Y will always exchange positions, but neither record can move forward. Collection Retrieval

Can be used for document retrieval, for example, for each keyword, the document retrieval system stores a bit vector, one per document. If a user wants to know which documents contain three keywords, the corresponding 3-bit vectors are then manipulated, and a bit with a value of 1 at the result position corresponds to the required document. Hashing Method

Hashing methods are usually not suitable for multiple applications with the same key code

For the case where the key code value is a number, there is a hash method that is appropriate, that is, the square of the method: To calculate the square of the critical code value, to remove the middle R bit of the result, the given range is from the 0~2^r-1 value

Process:
1. Calculation of Table position h (K)
2. Start with slot H (k), use (if required) conflict resolution policy to find a record hash containing key value K (store the conflict record outside the table):

This approach is generally used in main memory and not for disk, because multiple elements in a linked list may be stored on different disk blocks, this results in the retrieval of a key code that requires multiple disk accesses, thereby reducing the benefits of hashing methods (storing conflict records in another slot in the table):

The bucket hash divides the m slots in the hash list into B buckets, each containing a m/b slot, which, when found or inserted, will be looked up or inserted from the overflow bucket if the corresponding bucket is filled, Note: All barrels share an overflow bucket . Bucket hashing is useful for implementing a disk-based hash table because the same size can be set to the size of the disk block, and the entire bucket is read into memory whenever it is retrieved or inserted.

Linear probing

Before you get to the concept:
Base slot: A slot in which the key code is mapped through the hash function h (x) is called the base slot of the key code
Basic aggregation: Multiple key values some segments of the probing sequence found during the linear probe overlap, for example: Probing sequence 1{3,5,7,9}, sequence 2{5,7,9,11}

Unlike buckets, this is the most common hashing method, and his conflict resolution policy can use any slot in the hash table. The lookup method is: When the base slot that is hashed out according to the hash function h (x) is already occupied, the next slot is searched according to the discovery function P (k,i) until the corresponding key is found "where the desired position in the I discovery sequence is located, and P (k,i) returns the offset to the base slot" so it is important to find a suitable profiler function , or it will lead to the emergence of the problem of basic aggregation.
There are several common ways to solve this problem:
Pseudo-random probing: storing a set of random numbers in an array, I then select the elements in the array two probes: The profiler function is a 2-time equation, for example: P (k,i) =i^2 double hashing: When the hash function is clustered in one of the base slots, then pseudo random probes and two probes remain clustered. So if the profiler takes into account key code values, it can be avoided. For example: P (k,i) =I*H2 (K), where H2 is another hash function. deletion of hash elements
A hash element cannot simply mark the slot as empty when it is deleted because it disconnects the record after the probing sequence. Therefore, the concept of "tombstone" is introduced.

Tombstone: Mark a record once occupied this slot, has now been deleted, and if a tombstone is encountered, it will continue to be retrieved along the probing sequence; If a tombstone is encountered at the time of insertion, in order to prevent the insertion of the same key code leading to duplicate key codes in the hash, then continue to go back. Until you find a real vacancy and don't find the key value you want to insert, the new record is inserted into the slot of the tombstone that you encountered before

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.