Transfer from Http://www.jianshu.com/p/NsWbRv
Work needs to learn from MySQL for the specific implementation of select, in the online search for a long time, almost all of the introduction of the principle, for the implementation of details are not introduced, helpless under the source of their own gdb. Combined with the previous understanding of SQL parsing, the specific implementation of MySQL Select has a general understanding, summed up.
If you want GDB to step into debugging, you need to compile MySQL with the debug option, see this blog. After compiling, you can start with GDB. If you want the MySQL runtime to have log output, you can specify the path and log type of the output file: The --debug=d,info,error,query,enter,general,where:O,/tmp/mysqld.trace
log is useful for understanding the internal logic of MySQL.
At design time, MySQL adopted the idea of selecting one or several core algorithms with excellent performance as the engine for the main application scenarios, and then trying to use some non-major scenarios as a special case of the algorithm or to grow into the engine. In particular, MySQL's select query, the core function is the join query, so at design time, the core implementation of the Join function, for other functions, are converted to join to achieve.
For example select id, name from student;
, when MySQL is executed, it is also converted to a join operation.
Using GDB to step through the tracking can be seen that MySQL execution process is as follows:
- Assigning thread processing after receiving a request;
- SQL parsing, after MySQL parsing SQL, will generate a lot of item class. The item class is one of the most important classes in SQL parsing and execution, which can be found here.
- Executing SQL, you can see
JOIN::exec
that MySQL is converting any select to join to handle.
Take sql: select A.id, B.score from student A left join subject B on A.id=B.id where A.age > 10 and B.score > 60;
as an example to illustrate the process of step 3 above.
First, MySQL will optimize SQL before executing SQL, specifically in the JOIN::optimise
function. MySQL is doing a great job of optimizing for join, so it transforms all other operations into a very good join operation for performance implementations. For the above Sql,mysql when executed, the join key is also converted to a Where condition: A.id=B.id
to execute, then after processing, the above SQL has 3 where conditions:
A.age > 10
;
A.id = B.id
;
B.score > 60
;
When the preprocessing is finished, the JOIN::exec
function begins, calling the send_fields
function, returning the information of the final result, and then invoking it do_select
. MySQL join is using the nested loop join, which can be found in this blog. In a do_select
function, the Join function is implemented by invoking a sub_select
function.
In the above example, you need to complete 2 joins: Join Table A, then join table B (note here, it is not involved in several tables, you need to join several tables, MySQL join optimization is very strong, the specific explanation see). When MySQL is parsing SQL, it generates a list of tables that require joins, followed by a join operation on the list.
Continue gdb, in the sub_select
function, you can see such a line of code: (*join_tab->read_first_record)(join_tab)
This is to read the first row of table a result, you can see join_tab
the information inside the table a name. Next is a key function: evaluate_join_record
This function mainly does 2 things:
- The current information has been obtained in the Where condition calculation, to determine whether the need to continue to go down;
- Recursive join;
In the case of SQL above, the first join is executed first, and each row of table A is traversed, and each result is traversed, and the where condition is judged. It is important to note that the current where condition determines that only the columns that have been read are judged, since only the data of table A is read at this time, so now only the first where condition is A.age > 10
judged, and if satisfied, the join is called recursively: sql_select.cc: 11037 rc=(*join_tab->next_select)(join, join_tab+1, 0);
, the Next_select function here is that sub_select
MySQL is the way to implement recursive operations. If it is not satisfied, it will not recursively join, but continue to the next row of data, so as to achieve the purpose of pruning.
Continue to follow, at this time through the above next_select
recursive call to the sub_select
upper, the same will go the above logic, namely read_first_record
, first, then evaluate_join_record
, here because the table A and table B data have, so can the above 2 where conditions: A.id = B.id
and B.score > 60
to judge. In this case, all the where conditions have been judged, and if the current row satisfies 3 where conditions, the result can be output.
The above is the general process of the Select implementation, the main 2 points, one is the join IS recursive implementation, and the other is to read a table of data, will be the current where conditions to calculate, pruning. One more detail does not mention: How does MySQL make a where conditional judgment? Or, how does MySQL perform expression calculations?
The answer is the item class mentioned earlier. When MySQL parses, SQL resolves to a number of item, and the relationship between the individual item is established. For an expression, a tree of syntax is generated. For example, the expression: B.score > 60
, this will generate 3 item:, B.score
>
and 60
, where B.score
and 60
respectively is >
the left and right child, so that the value of the expression is to ask >
for val_int()
, Then we will call the left and right sub-tree recursively, and then val_int()
make a comparative judgment.
There is one more question: How to ask B.score
val_int()
? I have not specifically looked at the answer to this question, according to a previous colleague's SQL implementation, I have speculated that: B.score
is the real value in the data table, so its value must be obtained by going to the table. In the item class, there is a function: fix_field
It is used to tell the outside world where to get the value of this item, which is often called during the preprocessing phase of SQL execution. So at preprocessing, tell the item to go to a fixed buffer to read the results, and each time a row of data is read from the table, the data is saved in the buffer, so that the two can be associated. This part is purely personal speculation, interested students can see their own according to the source code.
Back to the point mentioned earlier, if we change SQL a little bit: that select A.id, B.score from student A left join subject B on A.id=B.id where B.score > 60;
is, what happens when we remove the first where condition?
The answer is that MySQL will do an optimization to convert SQL to select B.id, B.score from subject B where B.score > 60
, so there is no need for a with B join logic. In fact, the first time I used GDB in the SQL, the result is not see the recursive invocation sub_select
of the scene, but also thought the principle is not correct, later found to be MySQL optimization ramming chaos.
--SELECT statement: For the A-table filter in where, for the B-table filter on, the best effect
SELECT * from a
JOIN b on a.id=b.ref_id and b.age>10
WHERE a.sex= ' man '
1110MySQL Select Implementation principle