Chapter 4 query Algorithms

Last Update:2014-10-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Basic concepts:

Three search methods: linear search, tree search, and hash table search

Dynamic search table: when you perform a search operation on a table (such as insert or delete), the corresponding table is called a dynamic search table.

Static search table: opposite to dynamic search table

The average number of comparisons (also known as the average comparison length) to be executed for keywords in the search process as the criteria to measure the merits and demerits of a search algorithm.

Average comparison length:

650) This. width = 650; "src =" http://s3.51cto.com/wyfs02/M02/4B/60/wKioL1QqezeyrJAyAAAfi-VGEyI734.jpg "Title =" 21.png" alt = "wKioL1QqezeyrJAyAAAfi-VGEyI734.jpg"/>

Where: N indicates the number of nodes, and PI indicates the probability of finding the I-th node. If not stated, the search probability of each node is considered equal to 1/n. CI is the number of comparisons required to find the I node.

Linear search:

Basic Idea: from the end of the table, scan the linear table sequentially and compare the node keywords scanned with the given K value in sequence. If the keywords of the node currently scanned are the same as those of K, the search is successful. If the scan is complete and the node with the keyword K is not found, the search fails.

The average search length of the algorithm is N.

In either of the following cases, you can only search in sequence:

1) if the sequence table is an unordered table, you can only search it in sequence.

2) linear tables with chained storage structures can only be searched in sequence

Binary Search

Binary Search, also known as semi-query, is an efficient search method. Binary Search requires a linear representation of the ordered table, that is, the nodes in the table are ordered by keywords, And the ordered table is used as the storage structure of the table.

Basic Idea: Set the three variables low, high, and MID. They point to the lower bound, upper bound, and middle position of the current table to be searched. Initially, low = 0; high = n-1. Set the keyword of the data element to key;

1) make mid = (low + high)/2

2) Compare the key values of key and R [Mid]. If:

If the key value of R [Mid] is the same as the key value, the search is successful.
The key value of R [Mid] is smaller than the key, indicating that the record with the key keyword as the key may be located on the right of the record R [Mid]. Modify the search range and make the lower bound variable low = Mid + 1, the upper bound indicates that the variable hight remains unchanged.
The key value of R [Mid] is greater than the key, indicating that the record with the key keyword as the key may be located on the left of the record R [Mid]. Modify the search range and set the lower bound to indicate that the variable high = mid-1, the lower bound indicates that the variable low remains unchanged.
Compare the values of the current variables low and high. If low is less than or equal to high, repeat (1) and (2). If low> high, the entire search is completed, the query failed because the key keyword does not exist in the linear table.

Average search length: ASL = log2 (n + 1)-1

Multipart search:

Block lookup is also called sequential lookup. It is a search method with performance between sequential search and binary search.

Basic Idea: block-based search requires that the sequence table be divided into several blocks. The key-value storage sequence of each block is arbitrary, but it requires "Segmented Order ", the maximum key value in the previous part is smaller than that in the next one.That is, the inter-block nodes are ordered, and the intra-block nodes are arbitrary.. In addition, you also need to resume an index table. Each item in the index table corresponds to one of the ordered tables. The index items are composed of the keyword domain good chain domain, and the keyword domain stores the maximum value of the corresponding node in the block, the location where the Chain Domain stores the first node of the corresponding block. The index items in the index table are stored in ascending order of key values.

The largest keyword in each block and its starting position are extracted to form an index table. The index table is ordered by the keyword, so the index table is an incremental ordered set.

When a record with the same keyword as the key is found in an ordered table with an index, two steps are required.

1) First, search for the index table and determine the block of the record to be queried. An index indicates an ordered table. You can use binary search or sequential search to determine the node to be searched.

2) perform sequential search in the identified blocks. When records in a block are arranged in any order, they can only be searched in sequence.

Average search length: ASL = 650) This. width = 650; "width =" 17 "Height =" 21 "src ="/e/u261/themes/default/images/spacer.gif "style =" Background: URL ("/e/u261/themes/default/images/word.gif") No-repeat center; Border: 1px solid # DDD; "alt =" spacer.gif "/>, block lookup is a method between sequential lookup and binary lookup. It is faster than sequential lookup, however, the cost increases the secondary storage space and sorts ordered tables in parts. At the same time, the speed is slower than that of the binary search method, but the advantage is that you do not need to sort all records.

Hash Query Technology

Basic principle: convert a given key value to an offset address to retrieve records

The key-to-address conversion is done through a relationship (formula), which is a hash (hash) function. The hash function performs operations on the key to specify a hash value, which indicates the location where the record can be found.

The basic idea of the hash method is: set a table t with a length of M, and use a function to convert the keywords of N records in the dataset into 0 ~ The value in the S-1 range.

Hash Table conflicts: two different keywords are mapped to the same table because the hash function has the same value. This phenomenon is called a conflict or collision. The two conflicting keywords are called synonyms of the hash function.

The method to construct a hash function and resume to resolve conflicts is to create two tasks for the hash table.

There are two criteria for selecting a good hash function:

1) simple and fast computing

2) the uniform distribution of keys can be obtained in the address space.

Several common methods for constructing hash functions:

Average Chinese and French: the specific method is to first expand the difference of similarity through the square value of the keyword, and then take the number of digits in the middle as the hash function value based on the table length. Because the number of digits in the middle of a product is related to each digit of the multiplier, the resulting hash address is even.
Division Method: Remove keywords by using the table length m, and take the remaining number as the hash address.
Fold shift method: Based on the hash table length, divide the keywords into several segments as much as possible, add the values of these segments, and round the carry of the highest bit. The obtained result is the hash address. There are two ways to add, one is to fold, that is, to add the values in each segment alignment, called the shift method; the other is to fold, like a piece of paper, fold the original keywords into the middle according to the division of the middle, and then sum, called the collapse method.

Resolve hash conflicts

Open addressing: when a conflict occurs, use a method to test other storage units in the table until an empty location is found. Below are several

1) linear probing: scatter the list T [0... M] is regarded as a circular vector. If the initial probe address is D (H (key) = D), the longest probe sequence is:

D, D + 1, D + 2 ,..., M-1 ,..., D-1

That is, starting from address d during detection, t [d] is first detected, and t [d + 1]… Until T [s-1], then it loops to T [0], t [1],…, Until T [D-1.

The probe process ends in three cases:

If the unit of the current probe is empty, the search fails (if it is inserted, the key is written to it)
If there is a key in the unit of the current probe, the query is successful, but insertion means that the query fails.
If it is detected that t [D-1] still does not find a null unit or a key, no matter whether it is search or insert, it means failure (the table is full at this time)

2) Secondary probe method: the sequence of the secondary probe method is:

Hi = (H (key) + I * I) % M 0 ≤i ≤ m-1

That is, the probe sequence is d = H (key), d + 12, D + 22 ,... .

The defect of this method is that it is not easy to probe the whole hash space.

3) double hash

This method is one of the best methods in the open address method. In this method, if a conflict occurs, the second hash function is applied to obtain the standby location. Conflicting keys in the first test may have different values in the result of the second hash function.

. Linked List Method: The linked list method solves the conflict by connecting all nodes with synonyms in the same single-chain table. If the selected hash table has a length of M, the hash table can be defined as a pointer array consisting of m head pointers T [0... M-1 All nodes with the hash address I are inserted into a single-chain table with the T [I] As the header pointer. The initial values of each component in T are null pointers. In the linked list method, the loading factor α can be greater than 1, but it is generally less than or equal to 1.

This article is from the blog of "Tiger brother", please be sure to keep this source http://7613577.blog.51cto.com/7603577/1559983

Chapter 4 query Algorithms

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Chapter 4 query Algorithms

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support