Binary Search and its implementation in the database kernel | go deep into the MySQL Kernel

Source: Internet
Author: User
Tags vcard

Binary Search and its implementation in the database kernel | go deep into the MySQL Kernel

Binary Search and its implementation in the database Kernel Posted on April 1, 2013 By Hedengcheng

Problem background

 

During this year's intern recruitment examination, I got a binary search question. The question is as follows:

 

Given an array of natural numbers in ascending order, the array contains repeated numbers, such as [, 4, 6, 7]. Problem: given any natural number, perform binary search on the array, return the correct position of the array, and provide function implementation. Note: If the number is the same consecutively, the first matching position or the last matching position is returned, which is determined by the input parameter of the function.

 

Why did I have this question?

 

  • Binary Search is very important in database kernel implementation

    In the database kernel implementation, binary search is a very important logic. More than 99% of SQL statements (range scans on all indexes, equivalent queries, and unique queries ), binary Search is used to locate data.


    Consider a database table T1 (A int primary key, B INT). The B field in the table has a B + tree index. The value of the B Field recorded in the table is[1, 2, 3, 4, 4, 5, 6, 7]Sequence. Given the following two query statements, different binary search logic is used:


    Sql1: Select * from T1 where B>4;

    Sql2: Select * from T1 where B> =4;


    For sql1, the binary search of the index requires skipping all 4 and returning all records from the last 4. For sql2, the binary search needs to locate the first 4, then read all records in sequence.


    In addition, for other query logics in the database, binary search also requires more features, such:


    Sql3: Select * from T1 where B<2;

    Sql4: Select * from T1 where B<=2;


    Because database indexes support reverse scanning at the same time, SQL 3 and SQL 4 statements can all use reverse scanning of indexes. During reverse scanning, sql3 needs to locate the first 2 in the index, while sql4 needs to locate the last 2 in the index, and then start to return the index records that meet the query conditions in reverse.

     


  • Binary Search inProgramDesigned as a basic and error-prone Function

     

    The first correct Binary SearchAlgorithmIs published 12 years after the first binary search is implemented. EnterBinary SearchOrBinary SearchKeyword, there are a lot of relatedArticleOr the blog will discuss this topic.

     

Problems to be aware of when implementing Binary Search

 

This article does not describe how to implement a correct binary search. After all, there are a large number of correct versions on the Internet. Next, we will make some simple analysis based on some problems found during the exam correction process. We hope to implement an effective binary search algorithm, or even a binary search algorithm available in the database, it is helpful.

 

Question 1: Check whether the parameter is valid

 

A large number of papers, when providing an algorithm to solve this problem, directly start the calculation with the low and high parameters, but do not check the low/High parameters. Are records in the array the same as low/High? Is the interval composed of low and high valid?CodeLack of robustness.

 

In the database's binary search implementation, binary search is generally performed on an index page. There may be no user records on the index page (when all the records on the index page are deleted and are not merged with the brother page). At this time, low/High is 0, at this time, if you read records based on the Mid calculated by low/high, there will be a logical error.

 

Problem 2: Calculation of Binary Search Median

 

This is a classic topic. How to calculate the median value in binary search? In the exam, we generally provide two calculation methods:

 

Algorithm 1: Mid = (low + high)/2

Algorithm 2: Mid = low + (high-low)/2

 

At first glance, the algorithm is concise. After algorithm 2 is extracted, it is no different from algorithm 1. But in fact, the difference exists. In extreme cases, there is a risk of overflow in algorithm 1 (low + high), which leads to incorrect mid results and program errors. Algorithm 2 ensures that the calculated mid must be greater than low, less than high, and there is no overflow problem.

 

Return to the database binary search. A database index page (8 K or 16 K in size) has limited index records that can be stored, so it will certainly not appear (low + high) the risk of overflow. This is also why the value in InnoDB is implemented by algorithm 1. However, as a rigorous program designer, algorithm 2 is recommended to eliminate any potential risks.

 

Problem 3: recursive Binary Search

 

For more than half of the exam, binary search is performed using recursive calling. It cannot be said that Recursive Implementation is wrong, but it is about implementation efficiency. In general, recursive calls have overhead of stack pressure/output, and their efficiency is relatively low. The efficiency is the first in the case of a database system that provides fast query response to optimize code efficiency. We do not recommend that you use recursive binary search, at least in database kernel implementation. As far as I know, all open-source database systems, such as InnoDB and PostgreSQL, do not implement binary search recursively.

 

Question 4: how to find the first/last equivalence

 

Return to the question. The first/last equivalent item must be returned based on different input parameters. In the Background section of this article, I also explained the corresponding database query (>,>= the query requirements are different ). The answer for more than 80% of students in the exam isPerform binary search first. After the bits to be determined to be the same value, perform the following operations based on the passed flag (User Requirements:Flag = 1, returns the first equivalent;Flag = 0, returns the last equivalent)..

 

Likewise, this implementation cannot be said to be wrong, but there are also performance problems.Performance and performance are always one of the key considerations for implementing the database kernel.(I believe it is also a metric for all applications ). In a database, except for primary key indexes and unique indexes, many secondary indexes have the same key value, sometimes there are more than items with the same key value (consider a user's order or purchase record ).

 

Assume that an index page stores 400 records, all of which are the same key value. In this case, the first binary search and then the sequential traversal algorithm are used. The second binary search can only be used once, and the order is traversed for 199 times. Finally, the comparison is made for 200 times. Very low efficiency. Of course, I am also delighted to see the practices of another small number of students (the algorithms I expect to see) and use flag to correct the final results of each comparison. For example, if the comparison is equal (equal to 0, greater than 1, less than-1), but flag = 1, the comparison result after correction is 1, you need to move the high to mid of the binary search to continue the binary search. (If the flag is 0, the corrected result is-1. You need to move the low to mid of the binary search, continue to the second part ). In this way, the equivalence can still be searched in binary mode, and the final comparison only takes 9 times, far less than 200 times.

 

This problem further raises the next question: how to implement a common and more complex Binary Search Algorithm in the database?

 

Question 5: Implementation example of binary search in the database

 

Binary Search in databases is more complex. A general binary search algorithm is required to be used in different SQL query scenarios.

 

InnoDB summarizes four different search modes for different SQL statements:

 

# DefinePage_cur_g 1> Query

# DefinePage_cur_ge 2 >=, = Query

# DefinePage_cur_l 3 <Query

# DefinePage_cur_le 4 <= Query

 

Then, based on the four different search modes, adjust the search mode when the binary search encounters the same key value. For example, if the search mode isPage_cur_gOrPage_cur_le, Move low to mid and continue the binary search. If the search mode isPage_cur_geOrPage_cur_l, Move high to mid and continue the binary search.

 

Our TNT engine adopts different solutions from InnoDB, but also implements the same functions. The adjustment of the TNT engine to the same key value is summarized as that I will not explain it here. You can try your own analysis.

 

/* Operator includekey forward compare result: 10-1 */

========================================================== ============================================

>= 1 1 | 1 -1 -1

= 1 1 | 1 -1 -1

0 1 | 1 1 -1

<0 0 | 1 -1 -1

<= 1 0 | 11-1

========================================================== ============================================

 

Summary

This article analyzes a safe, reliable, and efficient binary search based on a binary search question and the questions exposed by the students. This section briefly analyzes the binary search implementation in the database kernel implementation, hoping to help you design the Binary Search Algorithm in the future.

This entry is published in InnoDB, programming, database, database kernel sharing, database development classification directory, and binary search, database, and binary search tags. Add a fixed link to favorites.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.