A detailed tutorial for implementing a binary lookup in Mysql _mysql

Source: Internet
Author: User

Given an array of natural numbers in ascending order, the array contains duplicate numbers, for example: [1,2,2,3,4,4,4,5,6,7,7]. Problem: Given any natural number, the array is binary lookup, return the correct position of the arrays, give the function realization. Note: The consecutive same number returns the first matching position or the last match position, which is determined by the function passing in the parameter.

Why do I have this problem?

Binary lookup is very important in database kernel implementation

In the kernel implementation of the database, the binary lookup is a very important logic, almost 99% of SQL statements (all the range scan/equivalence query/unique query, etc.) will use the binary lookup to locate the data.

Consider a database table T1 (a int primary key, B int), a B field on the table with a B + Tree index, and the record in the table for the value of the field, which is the [1,2,2,3,4,4,4,5,6,7,7] sequence in the title. At this point, given the following two query statements, you use a different binary lookup logic:

SQL1:

 SELECT * from t1 where B > 4;

SQL2:

SELECT * from t1 where b >= 4;

For SQL1, the binary lookup of an index needs to skip all 4, returning all records from the last 4, and for SQL2, the binary lookup needs to navigate to the first 4 and then read all the records sequentially.

In addition, for other query logic in the database, binary lookups need to be accompanied by additional features, such as:

SQL3:

SELECT * from t1 where B < 2;

SQL4:

 SELECT * from t1 where B <= 2;

Because the database index supports a reverse scan at the same time, SQL3, SQL4 statements can use an index reverse scan. In a reverse scan, SQL3 needs to navigate to the first 2 in the index, and SQL4, you need to navigate to the last 2 of the index, and then begin to reverse back the index record that satisfies the query criteria.
Two-point search is a very basic and error-prone function in program design.

The first truly correct binary lookup algorithm was published in 12 after the first binary search was implemented. Through Google, enter binary search or a binary search keyword, there are a number of related articles or blogs to discuss this topic.

The problems needing attention in the realization of binary search

This article is not prepared to describe in detail how a correct binary lookup should be implemented, after all, there are plenty of correct versions on the Web. Next, according to some problems found in the process of correcting papers, do some simple analysis, hope to realize an effective binary search algorithm, even a database can be used in binary search algorithm, helpful.
question one: whether to check the validity of the parameters

A large number of test papers, in the proposed solution to this problem, directly with the Low,high parameter began to calculate, but did not check the Low/high parameters. Is the Low/high the same, and is there a record in the array? Is the low/high of the structure valid? Code is not robust enough.

In the binary search implementation of database, it is generally a two-point search for an index page. There may not be a user's record in the index page (all the records in the index page are deleted and no merge with the sibling page), at this point, the Low/high are all 0, at this time, if according to the Low/high calculated mid to record reading, there is a logic error.
calculation of median value in 2:2-point lookup

This is a classic topic, how to calculate the median in a binary lookup? In the paper, we generally give two methods of calculation:

Algorithm One: Mid = (low + high)/2

Algorithm two: mid = low + (High–low)/2

At first glance, the algorithm is simple, the algorithm after the second extraction, with the algorithm is no difference. But in reality, the difference is there. The approach of algorithm one, in extreme cases, (low + high) there is the risk of overflow, and then get the wrong mid results, resulting in program errors. and algorithm two can ensure that the calculation of mid, must be greater than low, less than high, there is no overflow problem.

Back to the database binary lookup, a database index page (size is generally 8k or 16k), the ability to store index records is limited, so certainly does not appear (low + high) the risk of overflow. This is also why the median in InnoDB, is the implementation of algorithm one. However, as a rigorous program designer, it is recommended to use algorithm two, to kill any potential risk in the cradle.
problem Three: recursive implementation of binary search

More than half of the test papers, using a recursive call to achieve a binary lookup. It is not possible to say that recursive implementation is wrong, but rather to achieve efficiency problems. It is generally known that the recursive call has the cost of the stack/stack, and its efficiency is relatively low. And in a database such an extremely optimized code efficiency, provides the quick query response system, the efficiency is the first. Recursive implementation of binary lookups is not recommended, at least in the database kernel implementation. As far as I know, all open source database systems, such as: Innodb,postgresql do not use recursive method to achieve binary lookup.
question four: How to find the first/last equivalent

Return to the title, asking for the first/last equivalent to be returned according to the parameters passed in. In the background section of this article, I also explained that the database query for this problem (>,>= query requirements are different). In the test paper, more than 80% of the students answer is to do a binary search, after the same value is positioned, and then according to the incoming flag (user needs: flag = 1, return the first equivalent, flag = 0, return the last equivalent), and iterate sequentially until the item that satisfies the condition is positioned.

Again, it cannot be said that the implementation is wrong, but there are also performance problems. Performance performance is always one of the key considerations for the implementation of the database kernel (believed to be one metric for all applications). Database, in addition to the primary key index/unique index can guarantee only the key value, many of the two secondary indexes have the same key value, sometimes the same key value of the item will exceed the thousand items (consider a user's order, or a purchase record).

Suppose an index page holds 400 records, all of which are the same key values. At this time, the use of first-order search, after the sequential traversal algorithm, the binary search can only be used once, in order to traverse 199 times, and finally compared 200 times. The efficiency is very low. Of course, I am also delighted to see another small part of the practice of the students (I look forward to see the algorithm), with flag to correct the final results of each comparison. For example: The comparison is equal (equality is expressed in 0, is greater than 1, less than-1, but flag = 1, then return the corrected comparison result is 1, need to move the high to mid of the binary lookup, continue to two points (conversely, if flag = 0, then return the corrected result is-1, You need to move the low to mid for the two-point lookup, and continue with the second point. In this way, the equivalence can still be used for binary search, and the final comparison takes only 9 times and is much smaller than 200 times.

This issue further leads to the next question, how to implement a general, more complex binary lookup algorithm in the database?
question five: Examples of binary lookup implementations in databases

The binary lookup in the database is more complicated, and it needs to implement a universal binary lookup algorithm, which can be used in a variety of SQL query scenarios.

InnoDB for different SQL statements, summed up four different search Mode, respectively:

#define PAGE_CUR_G 1 > Enquiry

#define PAGE_CUR_GE 2 >=,= Query

#define PAGE_CUR_L 3 < Enquiry

#define PAGE_CUR_LE 4 <= Query

Then, according to the four different search Mode, adjust the binary lookup when it encounters the same key value. For example: If search mode is Page_cur_g or page_cur_le, move low to mid to continue the binary search, and if search mode is Page_cur_ge or page_cur_l, move high to mid. Continue with the binary lookup.

Our TNT engine uses a different approach than InnoDB, but it also implements the same functionality. TNT engine for the same key value adjustment summed up as the following figure, I do not explain this, you can try to do their own analysis.

/* operator Includekey forward Compare result:1 0-1 * *

=============================================================================

>= 1 1 | 1-1-1

= 1 1 | 1-1-1

> 0 1 | 1 1-1

< 0 0 | 1-1-1

<= 1 0 | 1 1-1

=============================================================================

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.