This article starts from the execution process of a select statement and traverses multiple subsystems of MySQL.
First, let's take a picture to start our adventure.
After the client connects to the MySQL server and sends a request, the server thread is blocked in the my_net_read function (that is, the read in the socket) in do_command (SQL/parse. cc ).
After the client sends an SQL statement (select * from zzz in this example) to the server, my_net_read returns and reads data from the tcpbuffer and writes it to the packet string.
- packet_length= my_net_read(net);
The first byte of packet is a flag, which determines whether the data packet is a query, a command, a success, or an error.
Next, enter the dispatch_command (SQL/parse. cc) function. The data type is no longer required.
- return_value= dispatch_command(command, thd, packet+1, (uint) (packet_length-1));
Go to dispatch_command. We can see that
- statistic_increment(thd->status_var.questions, &LOCK_status);
This is the value of show status questions.
Next to mysql_parse (SQL/SQL _parse.cc), this function is the total intersection of SQL statement parsing.
After entering this function, the first thing you encounter is query_cache_send_result_to_client. The original name of this function is to query whether there are the same statements in QCache. If yes, the results are immediately returned from QCache, so the entire SQL statement is over.
If the SQL statement does not exist in QCache, the system proceeds to parse_ SQL (SQL/SQL _parse.cc). This function mainly calls MYSQLparse. MYSQLparse is actually the yyparse (pai_^) in bison/yacc ),
- #define yyparse MYSQLparse
Yes, I started to parse the SQL statement. Let's briefly talk about lexical analysis and syntax matching.
For a statement like select * from zzz, first enter the lexical analysis, find two token (select, from), and then perform syntax Matching Based on the token. The rule is in SQL/yacc. yy, I paste several matched pattern and action.
- select:
- select_init
- {
- LEX *lex= Lex;
- lex->sql_command= SQLCOM_SELECT;
- }
- ;
- /* Need select_init2 for subselects. */
- select_init:
- SELECT_SYM select_init2
- | '(' select_paren ')' union_opt
- ;
- select_paren:
- SELECT_SYM select_part2
- {
- LEX *lex= Lex;
- SELECT_LEX * sel= lex->current_select;
- .....
- select_from:
- FROM join_table_list where_clause group_clause having_clause
- opt_order_clause opt_limit_clause procedure_clause
- {
- Select->context.table_list=
- Select->context.first_name_resolution_table=
- (TABLE_LIST *) Select->table_list.first;
- }
- ....
- select_item_list:
- select_item_list ',' select_item
- | select_item
- | '*'
- {
- THD *thd= YYTHD;
- Item *item= new (thd->mem_root)
- Item_field(&thd->lex->current_select->context,
- NULL, NULL, "*");
- if (item == NULL)
- MYSQL_YYABORT;
- if (add_item_to_list(thd, item))
- MYSQL_YYABORT;
- (thd->lex->current_select->with_wild)++;
- }
- ;
- select_item:
- remember_name select_item2 remember_end select_alias
- {
- THD *thd= YYTHD;
- DBUG_ASSERT($1 < $3);
- if (add_item_to_list(thd, $2))
- MYSQL_YYABORT;
- if ($4.str)
- ...
We can see that the most important thing in the action is the assignment of add_item_to_list and table_list.
After parsing, information about the tables (zzz) and fields (*) to be queried is written to the thd-> lex struct.
For example, thd-> lex-> query_tables indicates the table (zzz), thd-> lex-> current_select-> with_wild indicates whether the statement is used *, and so on.
- (gdb) p *thd->lex->query_tables
- $7 = {next_local = 0x0, next_global = 0x0, prev_global = 0x855a458, db = 0x85a16b8 "test", alias = 0x85a16e0 "zzz",
- table_name = 0x85a16c0 "zzz", schema_table_name = 0x0, option = 0x0, on_expr = 0x0, prep_on_expr = 0x0, cond_equal = 0x0,
After SQL Parsing is complete, what about optimization? Don't rush to look down.
Next, enter the mysql_execute_command function, which is the total entry of all SQL commands.
Because this SQL statement is a select statement, execute_sqlcom_select is the function to be executed, and then enters mysql_select (anger is so complicated ).
Mysql_select is the optimizer module. The code of this module is complicated. We can clearly see the three steps for creating, optimizing, and executing the optimizer. The optimization details are not listed.
- if (!(join= new JOIN(thd, fields, select_options, result)))
- ...
- if ((err= join->optimize()))
- ...
- join->exec();
After optimization, We need to execute join-> exec (). The function actually enters JOIN: exec () (SQL _select.cc ).
Exec () first sends the title function send_fields to the client, which has no data, but the field is also required.
Then enter do_select (), and jump to the specific implementation of the engine based on the table's storage engine (zzz is myisam table ).
Here, mi_scan is the function used by the myisam engine to scan files.
- (gdb) p info->filename
- ./test/zzz
Is this the physical file corresponding to the zzz table.
After accessing the disk with a series of mi functions, the system sends data to the client through send_data and returns the data from dispatch_command. Finally, the entire SQL statement is ended at net_end_statement.
The execution process behind a simple select statement is very complex, and the above steps only end at a point.
Ps: In SQL _yacc.yy, we can see that MySQL is mocking the commonly used dual tables in Oracle.