The usage of two temporary tables in MySQL

Source: Internet
Author: User
Tags mysql manual

The usage of two temporary tables in MySQL

External temporary table

TEMPORARY tables created using create temporary table are called external TEMPORARY tables. This temporary table is only visible to the current user. When the current session ends, the temporary table is automatically closed. The name of a temporary table can be the same as that of a non-temporary table (a non-temporary table with the same name will not be visible to the current session until the temporary table is deleted ).

Internal temporary table

An internal temporary table is a special lightweight temporary table used for performance optimization. This temporary table is automatically created by MySQL and used to store intermediate results of some operations. These operations may be included in the optimization or execution phase. This internal table is invisible to users. However, you can use EXPLAIN or show status to check whether MYSQL uses an internal temporary table to help complete an operation. The internal temporary table plays an important role in the optimization of SQL statements. Many operations in MySQL depend on the internal temporary table for optimization. However, to use an internal temporary table, you need to create a table and access the intermediate data. Therefore, you should avoid using a temporary table when writing SQL statements.

There are two types of internal temporary tables: one is the HEAP temporary table. All data in this temporary table is stored in the memory, and I/O operations are not required for such operations. The other is an OnDisk temporary table, which stores data on the disk as the name suggests. The OnDisk temporary table is used to process operations with large intermediate results. If the data stored in the temporary HEAP table is greater than MAX_HEAP_TABLE_SIZE (For details, refer to the system variables section in the MySQL Manual), the temporary HEAP table will be automatically converted to the temporary OnDisk table. In OnDisk temporary table 5.7, you can choose to use the MyISAM engine or InnoDB engine through the INTERNAL_TMP_DISK_STORAGE_ENGINE system variables.

This article mainly introduces which operations may be used in the internal temporary table. When writing SQL statements, you can use as few internal temporary tables as possible for query optimization, which will effectively improve the query execution efficiency.

First, we define a table t1,

CREATE TABLE t1( a int, b int); INSERT INTO t1 VALUES(1,2),(3,4);

All the operations below are based on table t1 for example.

Use SQL _BUFFER_RESULT hint in SQL statements

SQL _BUFFER_RESULT is mainly used for MySQL to release the table lock as soon as possible. If the data volume is large, it takes a long time to send the data to the client. By buffering the data to a temporary table, the time occupied by the read lock table can be effectively reduced.

For example:

mysql> explain format=json select SQL_BUFFER_RESULT * from t1; EXPLAIN {  "query_block": { "select_id": 1, "cost_info": {  "query_cost": "2.00" }, "buffer_result": {  "using_temporary_table": true,  "table": { "table_name": "t1", "access_type": "ALL", ... 

If the SQL statement contains DERIVED_TABLE.

In 5.7, due to the adoption of a new optimization method, we need to use set optimizer_switch = 'derived _ merge = off' to disable the merge of the derived table into the outer Query.

For example:

mysql> explain format=json select * from (select * from t1) as tt; EXPLAIN {  "query_block": { "select_id": 1, "cost_info": {  "query_cost": "2.40" }, "table": {  "table_name": "tt",  "access_type": "ALL",  ...  "materialized_from_subquery": { "using_temporary_table": true, ... 

If we query a table, the data in the system table will be stored in the internal temporary table.

Currently, you cannot use the EXPLAIN statement to check whether the internal temporary table is used to read system table data. However, you can use show status to check whether the internal temporary table is used.

For example:

mysql> select * from information_schema.character_sets; mysql> show status like 'CREATE%'; 

If the DISTINCT statement is not optimized, that is, the DISTINCT statement is optimized to the group by operation or the unique index is used to remove DISTINCT, the internal temporary table will be used.

mysql> explain format=json select distinct a from t1; EXPLAIN { {  "query_block": { "select_id": 1, "cost_info": {  "query_cost": "1.60" }, "duplicates_removal": {  "using_temporary_table": true, ... 

If the query contains an order by statement, it cannot be optimized. In the following situations, intermediate data is cached in the internal temporary table and then sorted.

1) if the connection table uses BNL (Batched Nestloop)/BKA (Batched Key Access)

For example:

1) BNL is enabled by default.

mysql> explain format=json select * from t1, t1 as t2 order by t1.a; EXPLAIN {  "query_block": {  "select_id": 1,  "cost_info": { "query_cost": "22.00"  },  "ordering_operation": { "using_temporary_table": true,  ... 

(2) After BNL is disabled, order by will directly use filesort.

mysql> set optimizer_switch='block_nested_loop=off'; Query OK, 0 rows affected (0.00 sec) mysql> explain format=json select * from t1, t1 as t2 order by t1.a; EXPLAIN {   "query_block": { "select_id": 1, "cost_info": {  "query_cost": "25.00" }, "ordering_operation": {  "using_filesort": true, ... 

2) The order by column does not belong to the first join table column in the execution plan.

For example:

mysql> explain format=json select * from t as t1, t as t2 order by t2.a; EXPLAIN {   "query_block": { "select_id": 1, "cost_info": {  "query_cost": "25.00" }, "ordering_operation": {  "using_temporary_table": true, ... 

3) if the order by expression is a complex expression.

So what order by expression does MySQL consider as a complex expression?

(1) If the sorting expression is SP or UDF.

For example:

drop function if exists func1; delimiter | create function func1(x int) returns int deterministic begin declare z1, z2 int; set z1 = x; set z2 = z1+2; return z2; end| delimiter ; explain format=json select * from t1 order by func1(a); { "query_block": { "select_id": 1, "cost_info": {  "query_cost": "2.20" }, "ordering_operation": {  "using_temporary_table": true, ... 

2) The order by column contains the clustering function.

To simplify the execution plan, we use INDEX to optimize the group by statement.

For example:

 create index idx1 on t1(a);  explain format=json SELECt a FROM t1 group by a order by sum(a);  | {   "query_block": { "select_id": 1, "cost_info": {  "query_cost": "1.20" }, "ordering_operation": {  "using_temporary_table": true,  "using_filesort": true,  "grouping_operation": { "using_filesort": false, ...  drop index idx1 on t1; 

3) The order by column contains a scalar subquery. Of course, this scalar subquery is not optimized.

For example:

explain format=json select (select rand() from t1 limit 1) as a from t1 order by a; | {  "query_block": { "select_id": 1, "cost_info": {  "query_cost": "1.20" }, "ordering_operation": {  "using_temporary_table": true,  "using_filesort": true, ... 

4) if the query contains both order by and group by statements, the columns used BY the two statements are different.

NOTE: If it is 5.7, we need to set SQL _mode to non-only_full_group_by mode. Otherwise, an error is returned.

Similarly, to simplify the execution plan, we use INDEX to optimize the group by statement.

For example:

set sql_mode=''; create index idx1 on t1(b); explain format=json select t1.a from t1 group by t1.b order by 1; | { "query_block": { "select_id": 1, "cost_info": {  "query_cost": "1.40" }, "ordering_operation": {  "using_temporary_table": true,  "using_filesort": true,  "grouping_operation": { "using_filesort": false, ... drop index idx1 on t1; 

If the query contains a group by statement, it cannot be optimized. In the following situations, intermediate data is cached in the internal temporary table, and group by is performed on the intermediate data.

1) if the connection table uses BNL (Batched Nestloop)/BKA (Batched Key Access ).

For example:

explain format=json select t2.a from t1, t1 as t2 group by t1.a; | { "query_block": { "select_id": 1, "cost_info": {  "query_cost": "8.20" }, "grouping_operation": {  "using_temporary_table": true,  "using_filesort": true,  "cost_info": { "sort_cost": "4.00" ... 

2) If the group by column does not belong to the first join table in the execution plan.

For example:

explain format=json select t2.a from t1, t1 as t2 group by t2.a; | { "query_block": { "select_id": 1, "cost_info": {  "query_cost": "8.20" }, "grouping_operation": {  "using_temporary_table": true,  "using_filesort": true,  "nested_loop": [ ... 

3) if the columns used BY the group by statement are different from those used BY the order by statement.

For example:

set sql_mode=''; explain format=json select t1.a from t1 group by t1.b order by t1.a; | {   "query_block": { "select_id": 1, "cost_info": {  "query_cost": "1.40" }, "ordering_operation": {  "using_filesort": true,  "grouping_operation": { "using_temporary_table": true, "using_filesort": false, ... 

4) if group by has a ROLLUP and is based on multi-Table outer join.

For example:

explain format=json select sum(t1.a) from t1 left join t1 as t2 on true group by t1.a with rollup; | { "query_block": { "select_id": 1, "cost_info": {  "query_cost": "7.20" }, "grouping_operation": {  "using_temporary_table": true,  "using_filesort": true,  "cost_info": { "sort_cost": "4.00"  }, ... 

5) if the columns used BY the group by statement are from scalar subquery and are not optimized.

For example:

explain format=json select (select avg(a) from t1) as a from t1 group by a; | { "query_block": { "select_id": 1, "cost_info": {  "query_cost": "3.40" }, "grouping_operation": {  "using_temporary_table": true,  "using_filesort": true,  "cost_info": { "sort_cost": "2.00"  }, ... 

The IN expression is converted to semi-join for optimization.

1) if the execution method of semi-join is Materialization

For example:

set optimizer_switch='firstmatch=off,duplicateweedout=off'; explain format=json select * from t1 where a in (select b from t1); | { "query_block": { "select_id": 1, "cost_info": {  "query_cost": "5.60" }, "nested_loop": [  { "rows_examined_per_scan": 1,  "materialized_from_subquery": { "using_temporary_table": true, "query_block": {  "table": { "table_name": "t1", "access_type": "ALL", ... 

2) If the execution method of semi-join is Duplicate Weedout

For example:

set optimizer_switch='firstmatch=off'; explain format=json select * from t1 where a in (select b from t1); | { "query_block": { "select_id": 1, "cost_info": {  "query_cost": "4.80" }, "duplicates_removal": {  "using_temporary_table": true,  "nested_loop": [ { ... 

If the query statement contains a UNION statement, MySQL uses an internal temporary table to help the UNION operation eliminate duplicates.

For example:

explain format=json select * from t1 union select * from t1; | { "query_block": { "union_result": {  "using_temporary_table": true,  "table_name": "", ... 

If the query statement is updated using multiple tables.

The internal temporary table cannot be seen in the Explain command, so you need to check the status.

For example:

update t1, t1 as t2 set t1.a=3; show status like 'CREATE%'; 

If the aggregate function contains the following functions, the internal temporary table will also be used.

1) count (distinct *) Example: explain format = json select count (distinct a) from t1; 2) group_concat

For example:

explain format=json select group_concat(b) from t1; 

In short, the above 10 cases are listed. MySQL will use the internal temporary table for intermediate result caching. If the data volume is large, the internal temporary table will store the data on the disk, this will obviously affect the performance. To minimize performance loss, we need to avoid the above situations as much as possible.

Summary

The above is all the details about the usage of the two temporary tables in MySQL. I hope it will be helpful to you. For more information about important MySQL variables, MySQL prepare principles, and MySQL table data deletion methods, please leave a message.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.