Reference article: http://www.jb51.net/article/24392.htm
When we design the database table structure, we need to pay attention to the performance of the data operation when we operate the database, especially the SQL statements when we look at the table. Here, we're not going to talk too much about optimizations for SQL statements, but only for MySQL, the most Web application database. Hopefully the following optimization tips are useful for you.
1. Optimize your query for query caching
Most MySQL servers have query caching turned on. This is one of the most effective ways to improve sex, and this is handled by the MySQL database engine. When many of the same queries are executed multiple times, the results of these queries are placed in a cache so that subsequent identical queries do not have to manipulate the table directly to access the cached results.
The main problem here is that this is a very easy thing to ignore for programmers. Because, some of our query statements will let MySQL not use the cache. Take a look at the following example:
$r = mysql_query ("Select username from user WHERE signup_date >= curdate ()"// /c4> $today = Date ("y-m-d"
The difference between the two SQL statements above is curdate (), and the MySQL query cache does not work for this function. Therefore, SQL functions such as now () and RAND () or whatever, do not turn on the query cache because the return of these functions is variable. So all you need to do is use a variable instead of the MySQL function to turn on the cache.
2. EXPLAIN your SELECT query
Use the EXPLAIN keyword to let you know how MySQL handles your SQL statements. This can help you analyze the performance bottlenecks of your query statement or table structure.
EXPLAIN's query results will also tell you how your index primary key is being leveraged, how your data tables are searched and sorted ... Wait, wait.
Pick one of your SELECT statements (it is recommended to pick one of the most complex, multi-table joins) and add the keyword explain to the front. You can use phpMyAdmin to do this. Then, you'll see a table. In the following example, we forget to add the group_id index and have a table join:
View
When we index the group_id field:
As we can see, the previous result shows a search of 7883 rows, and the second one searches only 9 and 16 rows of two tables. Looking at the rows column allows us to find potential performance issues.
3. Use LIMIT 1 when only one row of data is used
When you query a table, you already know that the result will only have one result, but because you might need to fetch the cursor, or you might want to check the number of records returned.
In this case, adding LIMIT 1 can increase performance. This way, the MySQL database engine stops searching after it finds a piece of data, instead of continuing to look for the next record-compliant data.
The following example, just to find out if there are users of "China", it is obvious that the latter will be more efficient than the previous one. (Note that the first one is select *, and the second is select 1)
Copy CodeThe code is as follows:
Not efficient:
$r = mysql_query ("SELECT * from user WHERE country = ' China '");
if (mysql_num_rows ($r) > 0) {
// ...
}
To be efficient:
$r = mysql_query ("Select 1 from user WHERE country = ' China ' LIMIT 1");
if (mysql_num_rows ($r) > 0) {
// ...
}
4. Jianjian index for search words
The index does not necessarily give the primary key or the unique field. If you have a field in your table that you will always use to do a search, then index it.
From you can see that search string "last_name like ' a% '", one is built index, one is no index, performance is about 4 times times worse.
In addition, you should also need to know what kind of search is not able to use the normal index. For example, when you need to search for a word in a large article, such as: "WHERE post_content like '%apple% '", the index may be meaningless. You may need to use a MySQL full-text index or make an index yourself (say, search for keywords or tags, etc.)
5. Use a fairly typed example in the Join table and index it
If your application has many join queries, you should confirm that the fields of join in two tables are indexed. In this way, MySQL internally initiates the mechanism for you to optimize the SQL statement for join.
Also, the fields that are used for join should be of the same type. For example, if you want to join a DECIMAL field with an INT field, MySQL cannot use its index. For those string types, you also need to have the same character set. (Two tables may not have the same character set)
Copy CodeThe code is as follows:
Find company in State
$r = mysql_query ("Select Company_Name from Users"
Left JOIN companies on (users.state = companies.state)
WHERE users.id = $user _id ");
The two state fields should be indexed and should be of the same type, with the same character set.
6. Never ORDER by RAND ()
Want to disrupt the data rows returned? Pick a random data? I don't know who invented this usage, but many novices like it. But you do not understand how horrible the performance problem is.
If you really want to disrupt the data rows that you return, there are n ways you can achieve this. This use only degrades the performance of your database exponentially. The problem here is that MySQL will have to execute the rand () function (which consumes CPU time), and this is done for each row of records to be recorded and then sorted. Even if you use limit 1 is useless (because to sort)
The following example randomly picks a record
Copy CodeThe code is as follows:
Never do this:
$r = mysql_query ("Select username from the user ORDER by RAND () LIMIT 1");
This is going to be better:
$r = mysql_query ("SELECT count (*) from user");
$d = Mysql_fetch_row ($r);
$rand = Mt_rand (0, $d [0]-1);
$r = mysql_query ("Select username from user LIMIT $rand, 1");
7. Avoid SELECT *
The more data you read from the database, the slower the query becomes. And, if your database server and Web server are two separate servers, this also increases the load on the network transport.
So, you should develop a good habit of taking whatever you need.
Copy CodeThe code is as follows:
Not recommended
$r = mysql_query ("SELECT * from user WHERE user_id = 1");
$d = Mysql_fetch_assoc ($r);
echo "Welcome {$d [' username ']}";
Recommended
$r = mysql_query ("Select username from user WHERE user_id = 1");
$d = Mysql_fetch_assoc ($r);
echo "Welcome {$d [' username ']}";
8. Always set an ID for each table
We should set an ID for each table in the database as its primary key, and the best is an int type (recommended to use unsigned), and set the automatically added Auto_increment flag.
Even if you have a field in the users table that has a primary key called "email", you don't have to make it a primary key. Use the VARCHAR type to degrade performance when the primary key is used. In addition, in your program, you should use the ID of the table to construct your data structure.
Also, under the MySQL data engine, there are some operations that need to use primary keys, in which case the performance and settings of the primary key become very important, such as clustering, partitioning ...
In this case, there is only one exception, which is the "foreign key" of the "association table", that is, the primary key of the table, which consists of the primary key of several other tables. We call this the "foreign key". For example: There is a "student table" has a student ID, there is a "curriculum" has a course ID, then, "Score table" is the "association table", which is associated with the student table and curriculum, in the score table, student ID and course ID is called "foreign key" it together to form a primary key.
9. Use ENUM instead of VARCHAR
The ENUM type is very fast and compact. In fact, it holds the TINYINT, but it appears as a string on its appearance. In this way, using this field to make a list of options becomes quite perfect.
If you have a field such as "gender", "Country", "nation", "state" or "department", you know that the values of these fields are limited and fixed, then you should use ENUM instead of VARCHAR.
MySQL also has a "suggestion" (see article tenth) to show you how to reorganize your table structure. When you have a VARCHAR field, this suggestion will tell you to change it to an ENUM type. With PROCEDURE analyse () you can get advice.
10. Obtaining recommendations from PROCEDURE analyse ()
PROCEDURE analyse () will let MySQL help you analyze your fields and their actual data, and will give you some useful advice. These suggestions will only become useful if there is actual data in the table, because it is necessary to have data as a basis for making some big decisions.
For example, if you create an INT field as your primary key, but there is not much data, then PROCEDURE analyse () suggests that you change the type of the field to Mediumint. Or you use a VARCHAR field, because there is not much data, you might get a suggestion that you change it to an ENUM. These suggestions are probably because the data is not enough, so the decision-making is not accurate.
In phpMyAdmin, you can view these suggestions by clicking "Propose table Structure" while viewing the table.
The Yangtze River Joint database of oil-fired hot air machine Learn getting started cloud computing and big data
It is important to note that these recommendations only become accurate when the data in your table is getting more and more. Be sure to remember that you are the one who will make the final decision.
11. Use not NULL where possible
Unless you have a very special reason to use null values, you should always keep your fields not NULL. This may seem a bit controversial, please look down.
First, ask yourself how big the difference is between "Empty" and "null" (if it's int, that's 0 and null)? If you feel that there is no difference between them, then you should not use NULL. (Do you know?) In Oracle, NULL and Empty strings are the same! )
Do not assume that NULL does not require space, that it requires extra space, and that your program will be more complex when you compare it. Of course, this is not to say that you cannot use NULL, the reality is very complex, there will still be cases where you need to use a null value.
Here is an excerpt from MySQL's own documentation:
"NULL columns require additional space in the row to record whether their values is null. For MyISAM tables, each of the NULL column takes one bit extra, rounded up to the nearest byte. "
Prepared Statements
Prepared statements is much like a stored procedure, a collection of SQL statements running in the background, and we can derive many benefits from using Prepared statements, whether it's a performance issue or a security issue.
Prepared statements can check some of the variables you've bound so that you can protect your program from "SQL injection" attacks. Of course, you can also manually check these variables, however, manual checks are prone to problems and are often forgotten by programmers. When we use some framework or ORM, this problem is better.
In terms of performance, this gives you a considerable performance advantage when the same query is used multiple times. You can define some parameters for these prepared statements, and MySQL will parse only once.
While the latest version of MySQL in the transmission prepared statements is using the binary situation, this makes the network transfer very efficient.
Of course, there are some cases where we need to avoid using prepared statements because it does not support query caching. But it is said that after version 5.1 was supported.
To use prepared statements in PHP, you can view its user manual: Mysqli extension or using the database abstraction layer, such as PDO.
Copy CodeThe code is as follows:
Create PREPARED statement
if ($stmt = $mysqli->prepare ("Select username from user WHERE state=?")) {
Binding parameters
$stmt->bind_param ("s", $state);
Perform
$stmt->execute ();
Binding results
$stmt->bind_result ($username);
Moving cursors
$stmt->fetch ();
printf ("%s is from%s\n", $username, $state);
$stmt->close ();
}
13. Non-buffered queries
Normally, when you execute an SQL statement in your script, your program will stop there until the SQL statement is returned, and your program continues to execute. You can use unbuffered queries to change this behavior.
In this case, there is a very good description in the PHP Documentation: Mysql_unbuffered_query () function:
"Mysql_unbuffered_query () sends the SQL query query to MySQL without automatically fetching and buffering the result rows As mysql_query () does. This saves a considerable amount of memory with SQL queries that produce large result sets, and can start working on t He result set immediately after the first row had been retrieved as you don ' t had to wait until the complete SQL query ha s been performed. "
The above sentence translates to say that mysql_unbuffered_query () sends an SQL statement to MySQL instead of automatically fethch and caches the results like mysql_query (). This can save a lot of considerable memory, especially those that produce a lot of results, and you don't have to wait until all the results are returned, and you can start working on the query results as soon as the first row of data is returned.
However, there are some limitations. Because you either read all the lines, or you want to call Mysql_free_result () to clear the results before making the next query. Also, mysql_num_rows () or Mysql_data_seek () will not work. So, you need to think carefully about whether to use unbuffered queries.
14. Save the IP address as UNSIGNED INT
Many programmers create a VARCHAR (15) field to hold IP in the form of a string rather than a shaped IP. If you use plastic to store it, you only need 4 bytes, and you can have a fixed-length field. And, this will bring you the advantage of querying, especially when you need to use such a where condition: IP between Ip1 and IP2.
We must use unsigned INT because the IP address uses an entire 32-bit unsigned shaping.
Instead of your query, you can use Inet_aton () to turn a string IP into a shape, and use Inet_ntoa () to turn an integer into a string IP. In PHP, there are also functions such as Ip2long () and Long2ip ().
1 $r = "UPDATE users SET IP = Inet_aton (' {$_server[' remote_addr ']} ') WHERE user_id = $user _id";
15. Fixed-length tables are faster
If all the fields in the table are fixed length, the entire table is considered "static" or "Fixed-length". For example, there are no fields of the following type in the table: Varchar,text,blob. As long as you include one of these fields, the table is not a fixed-length static table, so the MySQL engine will handle it in a different way.
Fixed-length tables can improve performance because MySQL searches faster because these fixed lengths are easy to calculate the offset of the next data, so the nature of reading will be fast. And if the field is not fixed, then every time you want to find the next one, you need the program to find the primary key.
Also, fixed-length tables are more likely to be cached and rebuilt. However, the only side effect is that a fixed-length field wastes some space, because the field is set to allocate so much space whether you use it or not.
Using the "vertical split" technique (see the next one), you can split your table into two that are fixed-length and one that is indefinite.
16. Vertical segmentation
"Vertical Segmentation" is a method of turning a table in a database into several tables, which reduces the complexity of the table and the number of fields for optimization purposes. (Previously, in a bank project, saw a table with more than 100 fields, very scary)
Example one: One of the fields in the Users table is the home address, which is an optional field, and you do not need to read or rewrite this field frequently in addition to your personal information when working in a database. So, why not put him in another table? This will make your table better performance, we think is not, a lot of time, I for the user table, only the user ID, user name, password, user role, etc. will be used frequently. A smaller table will always have good performance.
Example two: You have a field called "Last_login" that will be updated every time the user logs in. However, each update causes the table's query cache to be emptied. So, you can put this field in another table, so that you do not affect the user ID, user name, user role of the constant read, because the query cache will help you to add a lot of performance.
In addition, you need to note that these separated fields form the table, you do not regularly join them, otherwise, this performance will be worse than not split, and, it will be a drop of magnitude.
17. Splitting a large DELETE or INSERT statement
If you need to perform a large DELETE or INSERT query on an online website, you need to be very careful to avoid your actions to keep your entire site from stopping accordingly. Because these two operations will lock the table, the table is locked, the other operations are not in.
Apache will have a lot of child processes or threads. So, it works quite efficiently, and our servers don't want to have too many child processes, threads and database links, which is a huge amount of server resources, especially memory.
If you lock your watch for a period of time, say 30 seconds, for a site with a high level of access, the 30-second cumulative number of access processes/threads, database links, and open files may not only allow you to park the Web service crash, but may also leave your entire server hanging up.
So, if you have a big deal, you make sure you split it, using the LIMIT condition is a good way. Here is an example:
Copy CodeThe code is as follows:
while (1) {
Only 1000 at a time.
mysql_query ("DELETE from logs WHERE log_date <= ' 2009-11-01 ' LIMIT 1000");
if (mysql_affected_rows () = = 0) {
There's nothing to delete, quit!
Break
}
Take a break every time.
Usleep (50000);
}
18. The smaller the column the faster
For most database engines, hard disk operations can be the most significant bottleneck. So it's very helpful to have your data compact, because it reduces access to the hard drive.
See MySQL documentation Storage Requirements View all data types.
If a table has only a few columns (for example, a dictionary table, a configuration table), then we have no reason to use INT to master the keys, using Mediumint, SMALLINT or smaller TINYINT will be more economical. If you don't need to record time, using date is much better than DATETIME.
Of course, you also need to leave enough space for expansion, otherwise, you do this later, you will die very difficult to see, see Slashdot example (November 06, 2009), a simple ALTER TABLE statement took 3 hours, because there are 16 million data.
=========================================
The following information is referenced in: http://www.ihref.com/read-16422.html
1. To optimize the query, avoid full-table scanning as far as possible, and first consider establishing an index on the columns involved in the Where and order by.
2. You should try to avoid null values in the WHERE clause, otherwise it will cause the engine to abandon using the index for a full table scan.
SQL code: Select ID from t where num is null;
You can set the default value of 0 on NUM, make sure that the NUM column in the table does not have a null value, and then query:
SQL code: Select ID from t where num=0;
3. Try to avoid using the! = or <> operator in the WHERE clause, or discard the engine for a full table scan using the index.
4. You should try to avoid using or in the WHERE clause to join the condition, otherwise it will cause the engine to abandon using the index for a full table scan.
SQL code: Select ID from t where num=10 or num=20;
You can query this:
SQL code: Select ID from t where num=10 the union ALL select ID from t where num=20;
5.in and not in should also be used with caution, otherwise it will result in full table scans, such as:
SQL code: Select ID from the Where num in (All-in);
For consecutive values, you can use between instead of in:
SQL code: Select ID from t where num between 1 and 3;
6. The following query will also cause a full table scan:
SQL code: Select ID from t where name is like '%c% ';
To be more efficient, consider full-text indexing.
7. If you use a parameter in the WHERE clause, it also causes a full table scan. Because SQL resolves local variables only at run time, the optimizer cannot defer the selection of access plans to run time; it must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is still unknown and therefore cannot be selected as an input for the index. The following statement will perform a full table scan:
SQL code: Select ID from t where [email protected];
You can force the query to use the index instead:
SQL code: Select ID from T with (index name) where [email protected];
8. You should try to avoid expression operations on the fields in the WHERE clause, which will cause the engine to discard the full table scan using the index.
SQL code: Select ID from t where num/2=100;
You can query this:
SQL code: Select ID from t where num=100*2;
9. You should try to avoid function operations on the fields in the WHERE clause, which will cause the engine to discard the full table scan using the index. Such as:
SQL code: Select ID from t where substring (name,1,3) = ' abc '; #name ID starting with ABC
should read:
SQL code: Select ID from t where name is like ' abc% ';
10. Do not perform functions, arithmetic operations, or other expression operations on the left side of "=" in the WHERE clause, or the index may not be used correctly by the system.
11. When using an indexed field as a condition, if the index is a composite index, you must use the first field in the index as a condition to guarantee that the system uses the index, otherwise the index will not be used, and the field order should be consistent with the index order as much as possible.
12. Do not write meaningless queries, such as the need to generate an empty table structure:
SQL code: Select Col1,col2 into #t from T where 1=0;
This type of code does not return any result sets, but consumes system resources and should be changed to this:
SQL Code: CREATE TABLE #t (...);
13. It is a good choice to replace in with exists in many cases:
SQL code: Select num from a where num in (select num from B);
Replace with the following statement:
SQL code: Select num from a where exists (select 1 from b where num=a.num);
14. Not all indexes are valid for queries, SQL is query-optimized based on the data in the table, and when there is a large number of data duplication in the index columns, SQL queries may not take advantage of the index, as there are fields in the table ***,male, female almost half, so even if you build The index also does not work for query efficiency.
15. The index is not the more the better, although the index can improve the efficiency of the corresponding select, but also reduce the efficiency of insert and UPDATE, because the INSERT or update when the index may be rebuilt, so how to build the index needs careful consideration, depending on the situation. The number of indexes on a table should not be more than 6, if too many you should consider whether some of the indexes that are not commonly used are necessary.
16. You should avoid updating clustered index data columns as much as possible, because the order of the clustered index data columns is the physical storage order of the table records, which can consume considerable resources once the column values change to the order in which the entire table is recorded. If your application needs to update clustered index data columns frequently, you need to consider whether the index should be built as a clustered index.
17. Use numeric fields as much as possible, if the field containing only numeric information should not be designed as a character type, which will reduce the performance of queries and connections and increase storage overhead. This is because the engine compares each character in a string one at a time while processing queries and joins, and it is sufficient for a numeric type to be compared only once.
18. Use Varchar/nvarchar instead of Char/nchar as much as possible, because the first variable length field storage space is small, can save storage space, second, for the query, in a relatively small field in the search efficiency is obviously higher.
19. Do not use SELECT * from t anywhere, replace "*" with a specific field list, and do not return any fields that are not available.
20. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, be aware that the index is very limited (only the primary key index).
21. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.
22. Temporary tables are not unusable, and they can be used appropriately to make certain routines more efficient, for example, when you need to repeatedly reference a dataset in a large table or a common table. However, for one-time events, it is best to use an export table.
23. When creating a temporary table, if you insert a large amount of data at one time, you can use SELECT INTO instead of CREATE table to avoid causing a large number of logs to increase speed, and if the amount of data is small, create table and Insert.
24. If a temporary table is used, be sure to explicitly delete all temporary tables at the end of the stored procedure, TRUNCATE table first, and then drop table, which avoids longer locking of the system tables.
25. Avoid using cursors as much as possible, because cursors are inefficient and should be considered for overwriting if the cursor is manipulating more than 10,000 rows of data.
26. Before using a cursor-based method or temporal table method, you should first look for a set-based solution to solve the problem, and the set-based approach is generally more efficient.
27. As with temporary tables, cursors are not unusable. Using Fast_forward cursors on small datasets is often preferable to other progressive processing methods, especially if you must reference several tables to obtain the required data. Routines that include "totals" in the result set are typically faster than using cursors. If development time permits, a cursor-based approach and a set-based approach can all be tried to see which method works better.
28. Set NOCOUNT on at the beginning of all stored procedures and triggers, set NOCOUNT OFF at the end. You do not need to send a DONE_IN_PROC message to the client after each statement that executes the stored procedure and trigger.
29. Try to avoid large transaction operation and improve the system concurrency ability. The SQL optimization method uses indexes to traverse tables more quickly. The index established by default is a non-clustered index, but sometimes it is not optimal. Under a non-clustered index, the data is physically randomly stored on the data page. A reasonable index design should be based on the analysis and prediction of various queries. Generally speaking:
A. There are a large number of duplicate values, and often have a range of queries (>,<,> =,< =) and order BY, the group by the occurrence of the column, you can consider the establishment of cluster index;
B. Frequent simultaneous access to multiple columns, and each column contains duplicate values to consider the establishment of a composite index;
C. Composite indexes to make the key query as much as possible to form an index overlay, its leading column must be the most frequently used column. Indexes can help improve performance but not as many indexes as possible, but too many indexes in the opposite direction cause the system to be inefficient. Each index is added to the table, and maintenance of the index collection will be done with the corresponding update work.
30. Periodic analysis of tables and checklists.
Syntax for parsing tables: ANALYZE [LOCAL | No_write_to_binlog] TABLE tb1_name[, Tbl_name] ...
The above statement is used to analyze and store the keyword distribution of the table, the results of the analysis will enable the system to obtain accurate statistics, so that SQL can generate the correct execution plan. If the user feels that the actual execution plan is not the expected execution plan, executing the analysis table may solve the problem. During parsing, the table is locked with a read lock. This is useful for myisam,dbd and InnoDB tables.
For example, analyze a data table: Analyze table table_name
Check the syntax of the table: Check table Tb1_name[,tbl_name] ... [option]...option = {QUICK | FAST | MEDIUM | EXTENDED | CHANGED}
The purpose of the checklist is to check if one or more tables have errors, check table is useful for MyISAM and InnoDB tables, and the keyword statistics are updated for MyISAM tables
Check table also checks if the view has errors, such as the table referenced in the view definition does not exist.
31. Optimize the table regularly.
Syntax for tuning tables: OPTIMIZE [LOCAL | No_write_to_binlog] TABLE tb1_name [, Tbl_name] ...
If you delete a large part of a table, or if you have made more changes to a table with variable-length rows (a table with a VARCHAR, blob, or text column), you should use the Optimize Table command for table optimization. This command merges the space fragments in the table and eliminates the wasted space caused by the deletion or update, but the Optimize table command only works on MyISAM, BDB, and InnoDB tables.
Example: Optimize table table_name
Note: The table will be locked during analyze, check, and optimize execution, so be sure to take action when the MySQL database is not busy.
Add:
1, in the vast number of queries as far as possible to use the format conversion.
2. Order BY and Gropu by: using both the order by and the GROUP by phrase, any index contributes to the performance improvement of SELECT.
3. Any action on a column will result in a table scan, which includes database tutorial functions, calculation expressions, and so on, to move the operation to the right of the equals sign whenever possible.
4, IN, or clauses often use worksheets to invalidate the index. If you do not produce a large number of duplicate values, you can consider taking the sentence apart. The disassembled clause should contain an index.
5. Use smaller data types whenever possible to meet your needs: for example, using Mediumint instead of INT
6, try to set all the columns to NOT NULL, if you want to save null, manually set it, rather than set it as the default value.
7. Use VARCHAR, TEXT, BLOB type as little as possible
8. If your data is only a few of the few you know. It is best to use the ENUM type
9, as Graymice said, to build an index.
10, the rational use of the transport table and partition table to improve the data storage and extraction speed.
MySQL Performance Optimization