MySQL Characteristic analysis · Internal temp Table

Source: Internet
Author: User
Tags mysql manual scalar

http://mysql.taobao.org/monthly/2016/06/07/#rdMySQL中的两种临时表外部临时表

Temporary tables created from the Create temporary table, which are called external temporary tables. This temporary table is visible only to the current user, and the temporary table is closed automatically when the current session ends. The name of this temporary table can have the same name as a non-temporal table (after the same name, non-temporary tables will not be visible to the current session until the temporary table is deleted).

Internal temp Table

The internal temp table is a special lightweight temporary table for performance optimization. This temporary table is automatically created by MySQL and used to store intermediate results for some operations. These operations may be included in the optimization phase or the execution phase. This internal table is not visible to the user, but through explain or show status you can see if MySQL uses an internal temporary table to help with an operation. Internal temporal tables play a very important role in the optimization of SQL statements, and many of the operations in MySQL depend on internal temporal tables for optimization. However, the use of internal temporary tables requires the creation of tables and the access cost of intermediate data, so users should try to avoid using temporary tables when writing SQL statements.

There are two types of internal temporal tables: one is the heap staging table, all the data for this temporary table is in memory, and the operation of this table does not require IO operations. The other is the Ondisk temporary table, which, as its name implies, stores the data on disk. Ondisk temporary tables are used to handle intermediate results that are larger than the operation. If the heap temporary table stores more data than max_heap_table_size (refer to the System Variables section of the MySQL Manual for details), the heap temporary tables will be automatically converted to ondisk temporary tables. The Ondisk temp table can be selected using the MyISAM engine or InnoDB engine in 5.7 via the internal_tmp_disk_storage_engine system variable.

This article focuses on what operations might take advantage of internal temporary tables. If the user can write SQL statements with minimal use of internal temporary tables for query optimization, it will effectively improve the efficiency of query execution.

First we define a table T1,
CREATE TABLE t1( a int, b int); INSERT INTO t1 VALUES(1,2),(3,4);

All of the following actions are examples based on table T1.

    • Use sql_buffer_result hint in SQL statements

Sql_buffer_result is primarily used to allow MySQL to release the lock on the table as early as possible. Because if the amount of data is large, it takes a long time to send the data to the client, which effectively reduces the time it takes to read the lock on the table by buffering the data into the temporary table.
For example:

mysql> explain format=json select SQL_BUFFER_RESULT * from t1;EXPLAIN{  "query_block": {"select_id": 1,"cost_info": {  "query_cost": "2.00"},"buffer_result": {  "using_temporary_table": true,  "table": {"table_name": "t1","access_type": "ALL",...
    • If the SQL statement contains derived_table.

In 5.7, with a new optimization approach, we need to use set optimizer_switch= ' Derived_merge=off ' to disallow derived table from merging into the outer query.
For example:

mysql> explain format=json select * from (select * from t1) as tt;EXPLAIN{  "query_block": {"select_id": 1,"cost_info": {  "query_cost": "2.40"},"table": {  "table_name": "tt",  "access_type": "ALL",  ...  "materialized_from_subquery": {"using_temporary_table": true,...
    • If we query the system tables, the data from the system tables will be stored in the internal temporary table.

We are not currently able to use explain to see if the system table data needs to be exploited into an internal temporary table, but you can see whether the internal temporary table is being exploited through show status.
For example:

mysql> select * from information_schema.character_sets;mysql> show status like ‘CREATE%‘;
    • If the distinct statement is not optimized, that is, if the distinct statement is optimized for a group by operation or a unique index is used to eliminate distinct, the internal temporary table will be used.
mysql> explain format=json select distinct a from t1;EXPLAIN{{  "query_block": {"select_id": 1,"cost_info": {  "query_cost": "1.60"},"duplicates_removal": {  "using_temporary_table": true,...
    • If the query has an ORDER BY statement, it cannot be optimized. The following scenarios use the internal temporary table to cache intermediate data and then sort the intermediate data.

1) If the connection table uses BNL (batched nestloop)/bka (batched Key Access)
For example:

1) BNL is turned on by default

mysql> explain format=json select * from t1, t1 as t2 order by t1.a;EXPLAIN{  "query_block": {  "select_id": 1,  "cost_info": {"query_cost": "22.00"  },  "ordering_operation": {"using_temporary_table": true,  ...

2) After you turn off BNL, ORDER by will use Filesort directly.

mysql> set optimizer_switch=‘block_nested_loop=off‘;Query OK, 0 rows affected (0.00 sec)mysql> explain format=json select * from t1, t1 as t2 order by t1.a;EXPLAIN{   "query_block": {"select_id": 1,"cost_info": {  "query_cost": "25.00"},"ordering_operation": {  "using_filesort": true,...

2) The column for ORDER by does not belong to the column of the first join table in the execution plan.
For example:

mysql> explain format=json select * from t as t1, t as t2 order by t2.a;EXPLAIN{   "query_block": {"select_id": 1,"cost_info": {  "query_cost": "25.00"},"ordering_operation": {  "using_temporary_table": true,...

3) If the expression for order by is a complex expression.

So what kind of order by expression does MySQL think is a complex expression?

1)) If the sort expression is SP or UDF.
For example:

drop function if exists func1;delimiter |create function func1(x int)returns int deterministicbegindeclare z1, z2 int;set z1 = x;set z2 = z1+2;return z2;end|delimiter ;explain format=json select * from t1 order by func1(a);{"query_block": {"select_id": 1,"cost_info": {  "query_cost": "2.20"},"ordering_operation": {  "using_temporary_table": true,...

2)) The column of ORDER by contains the aggregate function

To simplify the execution plan, we use index to refine the group by statement.
For example:

  create index idx1 on t1(a);  explain format=json SELECt a FROM t1 group by a order by sum(a);  | {   "query_block": {"select_id": 1,"cost_info": {  "query_cost": "1.20"},"ordering_operation": {  "using_temporary_table": true,  "using_filesort": true,  "grouping_operation": {"using_filesort": false,...  drop index idx1 on t1;

3) The ORDER by column contains scalar subquery, and of course the scalar subquery is not optimized.
For example:

explain format=json select (select rand() from t1 limit 1) as a from t1 order by a;| {  "query_block": {"select_id": 1,"cost_info": {  "query_cost": "1.20"},"ordering_operation": {  "using_temporary_table": true,  "using_filesort": true,...

4) If the query has both an order by and a group by statement, the two statements use different columns.

Note: If it is 5.7, we need to set the Sql_mode to non-only_full_group_by mode, otherwise it will error.

Also to simplify the execution plan, we use index to refine the group by statement.
For example:

set sql_mode=‘‘;create index idx1 on t1(b);explain format=json select t1.a from t1 group by t1.b order by 1;| { "query_block": {"select_id": 1,"cost_info": {  "query_cost": "1.40"},"ordering_operation": {  "using_temporary_table": true,  "using_filesort": true,  "grouping_operation": {"using_filesort": false,...drop index idx1 on t1;
    • If the query has a GROUP BY statement, it cannot be optimized. The following scenarios use the internal temporary table to cache intermediate data and then group by for intermediate data.

1) If the connection table uses BNL (batched nestloop)/bka (batched Key Access).
For example:

explain format=json select t2.a from t1, t1 as t2 group by t1.a;| {"query_block": {"select_id": 1,"cost_info": {  "query_cost": "8.20"},"grouping_operation": {  "using_temporary_table": true,  "using_filesort": true,  "cost_info": {"sort_cost": "4.00"...

2) If the Group by column does not belong to the first join table in the execution plan.
For example:

explain format=json select t2.a from t1, t1 as t2 group by t2.a;| {"query_block": {"select_id": 1,"cost_info": {  "query_cost": "8.20"},"grouping_operation": {  "using_temporary_table": true,  "using_filesort": true,  "nested_loop": [...

3) If the group BY statement uses a different column than the column used by the order by statement.
For example:

set sql_mode=‘‘;explain format=json select t1.a from t1 group by t1.b order by t1.a;| {   "query_block": {"select_id": 1,"cost_info": {  "query_cost": "1.40"},"ordering_operation": {  "using_filesort": true,  "grouping_operation": {"using_temporary_table": true,"using_filesort": false,...

4) If group by IS with rollup and is based on a multi-sheet outer join.
For example:

explain format=json select sum(t1.a) from t1 left join t1 as t2 on true group by t1.a with rollup;| {"query_block": {"select_id": 1,"cost_info": {  "query_cost": "7.20"},"grouping_operation": {  "using_temporary_table": true,  "using_filesort": true,  "cost_info": {"sort_cost": "4.00"  },...

5) If the group BY statement uses a column that is derived from the scalar subquery, and is not optimized.
For example:

explain format=json select (select avg(a) from t1) as a from t1 group by a;| {"query_block": {"select_id": 1,"cost_info": {  "query_cost": "3.40"},"grouping_operation": {  "using_temporary_table": true,  "using_filesort": true,  "cost_info": {"sort_cost": "2.00"  },...
    • The in expression is converted to Semi-join for optimization
      1) If Semi-join is executed as materialization
      For example:
set optimizer_switch=‘firstmatch=off,duplicateweedout=off‘;explain format=json select * from t1 where a in (select b from t1);| {"query_block": {"select_id": 1,"cost_info": {  "query_cost": "5.60"},"nested_loop": [  { "rows_examined_per_scan": 1,  "materialized_from_subquery": {"using_temporary_table": true,"query_block": {  

2) If Semi-join is executed as duplicate weedout
For example:

set optimizer_switch=‘firstmatch=off‘;explain format=json select * from t1 where a in (select b from t1); | {"query_block": {"select_id": 1,"cost_info": {  "query_cost": "4.80"},"duplicates_removal": {  "using_temporary_table": true,  "nested_loop": [{...
    • If the query statement with UNION,MYSQL will take advantage of the internal temporary table to help the Union Operation de-duplicates.
      For example:
explain format=json select * from t1 union select * from t1;| {"query_block": {"union_result": {  "using_temporary_table": true,  "table_name": "<union1,2>",...
    • If the query statement uses multiple table updates.
      Here explain cannot see the internal temporary table being exploited, so you need to see the status.
      For example:
update t1, t1 as t2 set t1.a=3;show status like ‘CREATE%‘;
    • If the aggregate function contains the following function, the internal temporary table is also exploited.
1) count(distinct *)例如:explain format=json select count(distinct a) from t1;2) group_concat例如:explain format=json select group_concat(b) from t1;

In summary, there are 10 cases listed above, MySQL will use internal temporary table for intermediate result cache, if the amount of data is large, the internal temporary table will store the data on disk, which will obviously affect the performance. In order to reduce the performance loss as much as possible, we need to avoid this situation as much as possible.

MySQL Characteristic analysis · Internal temp Table

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.