First, SQL in the database execution process
II. Implementation Plan 1. ACID
Atomicity: All operations in a transaction (transaction) are either complete or not complete and do not end up in the middle of a session. An error occurs during the execution of a transaction and is restored (Rollback) to the state before the transaction begins, as if the transaction had never been executed.
Consistency: The integrity of the database is not compromised until the transaction begins and after the transaction has ended. This means that the data being written must fully conform to all of the preset rules, which include the accuracy of the data, the concatenation, and the subsequent database's ability to perform the scheduled work spontaneously.
Isolation: The ability of a database to read and write and modify its data at the same time for multiple concurrent transactions, which prevents inconsistencies in data resulting from cross-execution when multiple transactions are executing concurrently. Transaction isolation is divided into different levels, including read UNCOMMITTED, Read Committed, REPEATABLE READ (repeatable Read), and serialization (Serializable).
Persistence: After the transaction is finished, modifications to the data are permanent, even if the system failure is not lost.
2. Lock
2.1. According to the type of data operation
Read Lock: Also known as shared lock, for the same resource, multiple read operations can be done in parallel, and do not affect each other.
Write Lock: Also called exclusive lock. When the current thread writes data, it blocks other threads from reading and writing data.
2.2 According to the granularity of the
Table Lock: Lock entire table (MyISAM)
Row locks: Locks a row in a single table (INNODB)
Ye Shi: He was given a grain size between the table lock and the number of rows
3. Execution Plan explain
SELECT * from Tbl_notices
Description of the columns of the execution plan:
1.id,sql performs a smooth identification of SQL from large to small execution.
2. Select_type, is the type of query, can have the following:
2.1, simple, easy query
2.2, PRIMARY, main query (when multiple tables are associated)
2.3, union, joint query
2.4, DEPENDENT Union, query in the subquery
2.5. Union result, the Resultant Set of Union
2.6, subquery, the first subquery
2.7, DEPENDENT subquery, sub-query the first sentence
2.8, DERIVED, derived table
3.Table, which shows which table the data for this line is about.
4.Type, this column is important, showing which category the connection is using and if there are no use indexes. From best to worst connection types are const, EQ_REG, ref, range, Indexhe, and all
4.1. System, a special case of the const join type. Table has only one row to satisfy the condition
4.2, const, the table has a maximum of one matching row, which will be read at the beginning of the query. Because there is only one row, the column values in this row can be considered constants by the remainder of the optimizer. The const table is fast because they are read only once!
4.3, eq_ref, reads a row from the table for each row combination from the preceding table. This may be the best type of join, except for the const type. It is used in all parts of an index to be joined and the index is unique or primary KEY. Eq_ref can be used for indexed columns that use the = operator comparison.
The comparison value can be a constant or an expression that uses a column of a table that was read earlier in the table.
4.4. Ref, for each row combination from the preceding table, all rows with matching index values are read from this table. If the join uses only the leftmost prefix of the key, or if the key is not unique or primary key (in other words, if the join cannot select a single row based on the keyword), ref is used.
If you use a key that matches only a few rows, the join type is good. Ref can be used with indexed columns that use the = or <=> operator.
4.5, Ref_or_null, the join type is like ref, but added mysql can be specifically searched for rows that contain null values. The optimization of the join type is often used in the resolution subquery.
4.6. Index_merge, the join type indicates that the index merge optimization method is used. In this case, the key column contains the list of indexes used, and Key_len contains the longest key element for the index used.
4.7, Unique_subquery, this type replaces the ref of the in subquery in the following form. The value in (the SELECT primary_key from single_table WHERE some_expr) unique_subquery is an index lookup function that can completely replace a subquery and be more efficient.
4.8, Index_subquery, the join type is similar to Unique_subquery. You can replace in subquery
4.9, range, retrieves only the rows for a given range, and uses an index to select rows. The key column shows which index is used. The Key_len contains the longest key element of the index being used. In this type, the ref column is null.
4.10, index, the join type is the same as all except that only the index tree is scanned. This is usually faster than all, because the index file is usually smaller than the data file.
4.11. All, for each row combination from the previous table, perform a full table scan. If the table is the first table that is not marked const, this is usually not good and is usually poor in its case. You can usually add more indexes instead of all, so that the rows can be retrieved based on the constant values or column values in the preceding table.
The 5.possible_keys,possible_keys column indicates which index MySQL might use to find rows in the table.
6. The Key,key column shows the keys (indexes) that MySQL actually uses.
7.key_len, the length of the index to use. The shorter the length the better, without loss of accuracy
8. The Ref,ref column shows which column or constant is used together with key to select rows from the table.
9. The Rows,rows column shows the number of rows that MySQL must check when it executes a query.
Extra, this column contains the details of the MySQL solution query, as detailed below.
Extra, this column can display a lot of information, there are dozens of kinds. Commonly used as follows:
10.1. Distinct, once MySQL finds a row that matches the row, it no longer searches for
10.2. Not exists, using the anti-connection, first query the appearance, then query the inner table
10.3. Range checked for each Record (index map:#) does not find an ideal index, so for every combination of rows from the preceding table, MySQL checks which index is used and uses it to return rows from the table. This is one of the slowest connections to use the index
10.4. Using Filesort When you see this, the query needs to be optimized. MySQL requires additional steps to find out how to sort the rows that are returned. It sorts all rows based on the connection type and the row pointers for all rows that store the sort key values and matching criteria.
10.5. Using index column data is returned from a table that uses only the information in the index without reading the actual action, which occurs when all the request columns of the table are part of the same index
10.6. Using temporary When you see this, the query needs to be optimized. Here, MySQL needs to create a temporary table to store the results, which usually occurs on an order by on a different set of columns, rather than on the group by
10.7. The using WHERE clause is used to restrict which rows will match the next table or are returned to the user. If you do not want to return all rows in the table, and the connection type all or index, this occurs, or the query has a problem
10.8. Firstmatch (Tb_name): One of the new features of the optimized subquery introduced in 5.6.x, common in the WHERE clause contains a subquery of type in (). This can occur if the amount of data in the table is larger.
10.9. Loosescan (M.. N): One of the new features of the optimized subquery introduced after 5.6.x, which may occur when a subquery returns a duplicate record in the in () type subquery
Important: The columns of the execution plan indicate that the items in the red are the items that we should focus on when we do the SQL execution plan analysis4. Limitations of MySQL Execution plan
? Explain won't tell you about triggers, stored procedures, or user-defined functions that affect queries
? Explain does not consider the various caches
? Explain cannot display the optimizations that MySQL made when executing queries
? Some of the statistics are estimates, not exact values
? Expalin can only interpret select operations, other actions to override as Select to view execution plans
Third, optimization example 1. Special handling of pages
Low-efficiency
Select * from 4100000 Ten
The most efficient
Select * from fentrust e Inner Join (Selectfrom4100000-on= E.fid
Principle:
1. The FID was indexed, select FID from Fentrust limit 4100000 10 Go is the overwrite index without the disk, get the value directly inside the index, faster
2.mysql queries will query the subquery first,select fid from fentrust limit 4100000, 10 first check out 10 records, and then the 10 records associated query faster
2. Make the best use of sub-queries
Low-efficiency
SELECTWu.fuid, Wu.fwid, V.fvi_fid, V.fvi2_fid,SUM(L.fcount),SUM(L.famount),SUM(L.famount/V.famount*v.ffees), V.fentrusttype,0, '2018-01-01', now (),SUM(V.fleftcount),SUM(V.fleftfees),0 fromFentrustlog_vcoinINNER JOINFentrust_vcoin V onL.fen_fid=V.fidINNER JOINFwebsite_user Wu onWu.fuid=V.fus_fidWHEREL.fid not inch(SELECTL2.fid fromfentrustlog_vcoin L2, Fentrust_vcoin v2WHEREL2.fen_fid=V2.fid andL2.fprize=L.fprize andL2.fcount=L.fcount andL2.fcreatetime=L.fcreatetime andL2.fid<>L.fid andV2.fus_fid=V.fus_fid andWu.fwid=1)GROUP byWu.fuid, Wu.fwid, V.fvi_fid, V.fvi2_fid, V.fentrusttype
The most efficient
SELECTV.fus_fid,1, V.fvi_fid, V.fvi2_fid,SUM(V.fcount-v.fleftcount),SUM(V.fsuccessamount),SUM(v.ffees-v.fleftfees), V.fentrusttype,0, '2018-01-01', now (),SUM(V.fleftcount),SUM(V.fleftfees),0 fromFentrust_vcoin vwhereV.fstatus> 1 andV.fus_fidinch(SelectFuid fromFwebsite_user WuwhereWu.fwid= 1) andV.fvi2_fidinch(SelectFvid fromFwebsite_coinwherefwebsite_id= 1)GROUP byV.fentrusttype, V.fvi_fid, V.fvi2_fid, V.fus_fid
Principle:
First analyze the table is not necessary, from the other tables can be taken to the required fields, you can delete the extra table
Good at using subqueries, sub-query faster than join, although the law is not absolute, but the large table most effective
MySQL query will go first subquery
3. Where Condition Order
Low-efficiency
SELECT * from WHERE > + and > 300000
The most efficient
SELECT * from WHERE > 300000 and > +
Principle:
E.fcount > 1000:480,000 lines
E.famount > 300,000:24 Lines
Who first who?
The result is not very effective
The Where condition tries to put the small result set in front, because the query will filter out the small results and then filter out the large results inside the value of satisfying conditions better
4. Big business Issues
Try to avoid large transaction operations and improve the system concurrency capability. Sometimes unavoidable, use timer delay processing instead
What is the big business problem? That is, the operation TENS data insert, UPDATE, delete, query, this matter when analysis can use timer processing, such as every morning to start operation
5. Do not walk the index situation
SELECT Famount from Fentrust WHERE famount +10=30;--does not use the index because all indexed columns participate in the calculation
SELECT Famount from Fentrust WHERE left (fcreatetime,4) <1990; --The index is not used because the function operation is used, and the principle is the same as above
SELECT * from Fuser WHERE floginname like ' 138%-Walk the index will go b-tree part of the index, specifically please Baidu b-tree the results of the map, or in the previous article also introduced
SELECT * from Fuser where floginname like "%7488%"--not indexed--regular expressions do not use indexes, which should be well understood, so why it is difficult to see the reason for regexp keywords in sql--string vs. number less use of indexes;
EXPLAIN SELECT * from a WHERE ' a ' = 1--Do not go index
SELECT * from Fuser where floginname=xxx or femail=xx or Fstatus=1--if there is or in the condition, it will not be used even if there is a conditional index. In other words, all the fields that are required to be used must be indexed and we recommend that you try to avoid using the OR keyword
If MySQL estimates that using a full table scan is faster than using an index, the index is not used
Iv. Tens Data optimization 1. Some Big data table optimization strategies collected on the Internet
1. To optimize the query, avoid full-table scanning as far as possible, and first consider establishing an index on the columns involved in the Where and order by.
2. You should try to avoid null values in the WHERE clause to judge the field, otherwise it will cause the engine to abandon using the index for full table scan, such as: Select ID from the where num is null can set the default value of 0 on NUM, to ensure that the table NUM column does not have a null value, and then After this query: Select ID from t where num=0
3. Try to avoid using the! = or <> operator in the WHERE clause, otherwise the engine discards full table scanning using the index.
4. You should try to avoid using or in the WHERE clause to join the condition, otherwise it will cause the engine to abandon using the index for a full table scan, such as: Select ID from t where num=10 or num=20 can query: Select ID from t where n UM=10 UNION ALL select IDs from T where num=20
5.in and not in is also used with caution, otherwise it will cause a full table scan, such as: Select ID from t where num in (between) for consecutive values, can not be used in the "in": Select ID from t where nu m between 1 and 3
6. The following query will also cause a full table scan: Select ID from t where name like ' Li% ' to improve efficiency, you can consider full-text indexing.
7. If you use a parameter in the WHERE clause, it also causes a full table scan. Because SQL resolves local variables only at run time, the optimizer cannot defer the selection of access plans to run time; it must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is still unknown and therefore cannot be selected as an input for the index. The following statement will perform a full table scan: Select ID from t where [email protected] can be changed to force query using index: SELECT ID from T with (index name) where [email protected ]
8. You should try to avoid expression operations on the fields in the WHERE clause, which will cause the engine to discard the full table scan using the index. For example: Select ID from t where num/2=100 should be changed to: Select ID from t where num=100*2
9. You should try to avoid function operations on the fields in the WHERE clause, which will cause the engine to discard the full table scan using the index.
For example: Select ID from t where substring (name,1,3) = ' abc ', name begins with ABC ID
should read:
Select ID from t where name like ' abc% '
10. Do not perform functions, arithmetic operations, or other expression operations on the left side of "=" in the WHERE clause, or the index may not be used correctly by the system.
11. When using an indexed field as a condition, if the index is a composite index, you must use the first field in the index as a condition to guarantee that the system uses the index, otherwise the index will not be used, and the field order should be consistent with the index order as much as possible.
12. Do not write meaningless queries, such as the need to generate an empty table structure: Select Col1,col2 into #t from T where 1=0
This type of code does not return any result sets, but consumes system resources and should be changed to this:
CREATE TABLE #t (...)
13. Many times replacing in with exists is a good choice: Select num from a where num in (select num from B)
Replace with the following statement:
Select num from a where exists (select 1 from b where num=a.num)
14. Not all indexes are valid for queries, SQL is query-optimized based on data in the table, and when there is a large amount of data duplication in the index columns, SQL queries may not take advantage of the index, as there are fields in the table Sex,male, female almost half, So even if you build an index on sex, it doesn't work for query efficiency.
15. The index is not the more the better, although the index can improve the efficiency of the corresponding select, but also reduce the efficiency of insert and UPDATE, because the INSERT or update when the index may be rebuilt, so how to build the index needs careful consideration, depending on the situation. The number of indexes on a table should not be more than 6, if too many you should consider whether some of the indexes that are not commonly used are necessary.
16. You should avoid updating clustered index data columns as much as possible, because the order of the clustered index data columns is the physical storage order of the table records, which can consume considerable resources once the column values change to the order in which the entire table is recorded. If your application needs to update clustered index data columns frequently, you need to consider whether the index should be built as a clustered index.
17. Use numeric fields as much as possible, if the field containing only numeric information should not be designed as a character type, which will reduce the performance of queries and connections and increase storage overhead. This is because the engine compares each character in a string one at a time while processing queries and joins, and it is sufficient for a numeric type to be compared only once.
18. Use Varchar/nvarchar instead of Char/nchar as much as possible, because the first variable length field storage space is small, can save storage space, second, for the query, in a relatively small field in the search efficiency is obviously higher.
19. Do not use SELECT * from t anywhere, replace "*" with a specific field list, and do not return any fields that are not available.
20. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, be aware that the index is very limited (only the primary key index).
21. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.
22. Temporary tables are not unusable, and they can be used appropriately to make certain routines more efficient, for example, when you need to repeatedly reference a dataset in a large table or a common table. However, for one-time events, it is best to use an export table.
23. When creating a temporary table, if you insert a large amount of data at one time, you can use SELECT INTO instead of CREATE table to avoid causing a large number of logs to increase speed, and if the amount of data is small, create table to mitigate the resources of the system tables. Then insert.
24. If a temporary table is used, be sure to explicitly delete all temporary tables at the end of the stored procedure, TRUNCATE table first, and then drop table, which avoids longer locking of the system tables.
25. Avoid using cursors as much as possible, because cursors are inefficient and should be considered for overwriting if the cursor is manipulating more than 10,000 rows of data.
26. Before using a cursor-based method or temporal table method, you should first look for a set-based solution to solve the problem, and the set-based approach is generally more efficient.
27. As with temporary tables, cursors are not unusable. Using Fast_forward cursors on small datasets is often preferable to other progressive processing methods, especially if you must reference several tables to obtain the required data. Routines that include "totals" in the result set are typically faster than using cursors. If development time permits, a cursor-based approach and a set-based approach can all be tried to see which method works better.
28. Set NOCOUNT on at the beginning of all stored procedures and triggers, set NOCOUNT OFF at the end. You do not need to send a DONE_IN_PROC message to the client after each statement that executes the stored procedure and trigger.
29. Try to avoid large transaction operation and improve the system concurrency ability.
30. Try to avoid the return of large data to the client, if the amount of data is too large, should consider whether the corresponding demand is reasonable.
2. Bulk Delete, not disposable
while (true) { // do only 1000 bars at a time "delete from logs where log_date <= ' 2012-11-01 ' limit 1000 "; if(mysql_affected_rows = = 0) { // Delete complete, exit! break;} // pause for a period of time, freeing the table for other processes/threads to access. thread.sleep (5000L)}
3. Big Data Table Optimization
Create a summary table
Create a water table
Sub-table sub-Library
Note: in the beginning, do not consider the sub-database table, you can first use the summary table strategy, what is the summary table? is the basic data up a dimension statistics, such as some data is added by the hour, it can be summarized according to days, on the basis of the day and then use the program to statistics, the specific statistical dimension according to the business to
Performance Optimization series seven: SQL optimization