20 + MySQL best practices and mysql20 Best Practices
Database operations are the main bottlenecks in today's Web applications. Not only does DBA (Database Administrator) have to worry about various performance issues, but programmers have to worry about optimizing query performance and writing better code to make accurate structured tables. In this article, I listed some MySQL optimization technologies for programmers.
Before we start learning, I add: you can find a large number of MySQL scripts and utilities on the Envato Market.
1. Optimized query Cache
Most MySQL servers have the query cache function. This is one of the most effective ways to improve performance, which is handled by the database engine in private. When the same query is executed multiple times, the results are directly extracted from the cache, which is fast.
The main problem is that it is too easy for programmers to see and many of us can easily ignore it. In fact, we can organize the query cache to execute tasks.
// query cache does NOT work$r = mysql_query("SELECT username FROM user WHERE signup_date >= CURDATE()");// query cache works!$today = date("Y-m-d");$r = mysql_query("SELECT username FROM user WHERE signup_date >= '$today'");
The reason why the query cache is not executed on the first line is the usage of the CURDTE () function. This applies to all non-deterministic features, such as NOW () and RAND... Because the results returned by the function are variable. MySQL decides to disable the query cache of the queryer. What we need to do is to add an additional line of PHP to prevent it from happening before the query.
2. EXPLAIN your selection Query
The EXPLAIN keyword helps you understand how MySQL runs your query. This helps you identify bottlenecks and other problems with queries or table structures.
The EXPLAIN query results show which index has been used, how to scan and store it, and so on...
SELECT a SELECT query (a complex query with a connection will be better), and add the keyword "EXPLAIN" before it, so that you can directly use the database. The result is displayed in a beautiful table. For example, I forgot to add an index in the column when executing the connection:
Now it only scans 9 and 16 rows from table 2, instead of 7883 rows. The empirical rule is to multiply the number in the column of all "rows". Your query performance will be proportional to the result number.
3. Use LIMIT 1 to obtain a unique row
Sometimes when you look up a table, you already know that you are looking for only one row of results. You may be retrieving a unique record, or you may be querying for records that meet your WHERE clause conditions.
In this case, adding LIMIT 1 to the query conditions can improve performance. In this way, the database engine will stop scanning records after finding the first record, instead of traversing the entire table or index.
// do I have any users from Alabama?// what NOT to do:$r = mysql_query("SELECT * FROM user WHERE state = 'Alabama'");if (mysql_num_rows($r) > 0) { // ...}// much better:$r = mysql_query("SELECT 1 FROM user WHERE state = 'Alabama' LIMIT 1");if (mysql_num_rows($r) > 0) { // ...}
4. index search fields
The index is not just a primary key or a unique key. If you search for any column in your table, you should index them.
As you can see, this rule also applies to some string searches such as "last_name LIKE 'a %. MySQL can use the index of that column when searching from the beginning of a string.
You should also understand what kind of search does not use regular indexes. For example, when you search for a word (for example, "WHERE post_content LIKE '% apple %'"), you will not see the benefits of normal indexes. You 'd better use mysql full-text search or build your own indexing solution.
5. Index and use the same field type for the connection
If your application contains many connection queries, make sure that the connected fields are indexed on both tables. This will affect MySQL's internal optimization of connection operations.
In addition, the connected fields must use the same type. For example, if you use a DECIMAL field to connect the INT field of another table, MySQL cannot use at least one index. The same character type is required even for character encoding.
// looking for companies in my state$r = mysql_query("SELECT company_name FROM users LEFT JOIN companies ON (users.state = companies.state) WHERE users.id = $user_id"); // both state columns should be indexed// and they both should be the same type and character encoding// or MySQL might do full table scans
6. Do not order by rand ()
At first it was a cool-sounding technique that caught many cainiao programmers in this trap. But you may not know that once you start to use it in the query, you have created a terrible query bottleneck.
If you really need to sort the results randomly, there is a better way. Add some additional code to prevent the bottleneck caused by exponential growth of data. The key issue is that MySQL must perform the RAND () operation on each row of the table before sorting (this requires processing capability) and only give one row.
// what NOT to do:$r = mysql_query("SELECT username FROM user ORDER BY RAND() LIMIT 1"); // much better: $r = mysql_query("SELECT count(*) FROM user");$d = mysql_fetch_row($r);$rand = mt_rand(0,$d[0] - 1); $r = mysql_query("SELECT username FROM user LIMIT $rand, 1");
Select a random number smaller than the number of results and use it as the offset in the LIMIT clause.
7. Avoid using SELECT *
The more data read from a data table, the slower the query operation. It increases the time required for disk operations. In addition, when the database server and the Web server are separated, a longer network latency may occur because data must be transmitted between the server.
This is a good habit: When you use SELECT statements, you always specify the columns you need.
// not preferred$r = mysql_query("SELECT * FROM user WHERE user_id = 1");$d = mysql_fetch_assoc($r);echo "Welcome {$d['username']}"; // better:$r = mysql_query("SELECT username FROM user WHERE user_id = 1");$d = mysql_fetch_assoc($r);echo "Welcome {$d['username']}"; // the differences are more significant with bigger result sets
8. There is almost always an id field
In each data table with the id column as the primary key, select AUTO_INCREMENT or INT first. You can also use UNSIGNED, because the value cannot be negative.
Even if you have a user table with a unique username field, do not use it as the primary key. The speed of using the VARCHAR field as the primary key for retrieval is slow. Your code will be more structured by referencing all user data with an internal ID.
Some background operations are performed by the MySQL engine itself, which uses the primary key field internally. When the database settings are more complex (cluster, partition, etc.), this becomes more important.
One possible exception to this rule is "Join table", which is used for multi-to-multi-type association between two tables. For example, the "posts_tags" table contains two columns: post_id and tag_id, which are used to save the relationship between the two tables named "post" and "tags. These tables can have the PRIMARY key that contains two id fields.
9. ENUM is preferred for VARCHAR
The ENUM Enumeration type is extremely fast and compact. They are stored internally like TINYINT, but they can contain and display string values. This makes them perfect candidates for some fields.
If a field contains only several different values, use ENUM instead of VARCHAR. For example, it can be a column named "status" and only contain values such as "active", "inactive", "pending", and "expired...
There is even one way to refactor your data table, you can get a "suggestion" from MySQL itself ". When you have a VARCHAR field, we recommend that you change the column type to ENUM. This is done by calling procedure analyze.
10. Use procedure analyse () for advice
Procedure analyse () will use MySQL to analyze the Column Structure and actual data in the table, providing you with some suggestions. It is useful only when the data table contains actual data, because it is important for analysis and decision-making.
For example, if you have created a primary key of the INT type, but there are not many rows, MySQL may recommend that you use MEDIUMINT instead. Or if you use the VARCHAR field, if there are only a few values in the table, you may be advised to convert it to ENUM.
You can also click the "suggested table structure" link in phpmyadmin in one of the table views to perform this operation.
Remember, these are just suggestions. If your data tables become larger, they may not even be the correct advice. You decide how to modify it.
11. Use not null if possible
Unless you have a very important reason to use the NULL value, you should set your column not null.
First, ask yourself between the NULL String Value and the NULL value (corresponding to the INT field: 0. NULL) Is there any difference. if there is no reason to use these two fields together, you do not need a NULL field (Do you know that NULL and NULL strings are the same in Oracle ?).
NULL columns require extra space, which increases the complexity of your comparison statements. If possible, avoid them. Of course, I understand that some people may have very important reasons to use the NULL value, which is not always a bad thing.
From the MySQL document:
"NULL columns require extra space when the row records whether their values are NULL. For example, in the MyISAM table, each NULL column has an additional bit that is clustered in the nearest byte. "
12. Pre-processing statement
The use of pre-processing statements has many benefits, including higher performance and better security.
By default, pre-processing statements filter the variables bound to them, which is extremely effective in preventing SQL injection attacks. You can also specify the variables to be filtered. However, these methods are more prone to human errors and are more easily forgotten by programmers. This may cause some problems when using the framework or ORM.
Since we focus on performance, we should say the benefits of this aspect. When the same query is used multiple times in an application, its benefits are particularly obvious. Since different parameter values are passed into the same prepared statement, MySQL will only parse this statement once.
At the same time, the latest version of MySQL will adopt the binary format when transmitting the prepared statements, which has a significant effect and is helpful for reducing network latency.
Some time ago, many programmers avoided using preprocessing statements for an important reason. This is because they are not cached by MySQL. However, query cache is also supported in version 5.1.
To use preprocessing statements in PHP, you can look at mysqli extension or use the data abstraction layer, such as PDO.
// create a prepared statementif ($stmt = $mysqli->prepare("SELECT username FROM user WHERE state=?")) { // bind parameters $stmt->bind_param("s", $state); // execute $stmt->execute(); // bind result variables $stmt->bind_result($username); // fetch value $stmt->fetch(); printf("%s is from %s\n", $username, $state); $stmt->close();}
13. No buffer Query
Generally, when you execute a query from a script, you need to wait until the query execution is complete before it can continue the subsequent tasks. You can use a non-buffered query to change this situation.
In the PHP document, the mysql_unbuffered_query () f function has a good explanation:
"The SQL query statement sent by mysql_unbuffered_query () to MySQL does not automatically fetch and buffer the result row like mysql_query. This saves a lot of memory for queries that generate a large number of result sets. When the first row has been retrieved, you can continue working on the result set immediately, instead of waiting until the SQL query is executed. "
However, it has some limitations. You must read all rows or call mysql_free_result () before executing another query (). In addition, you cannot use mysql_num_rows () or mysql_data_seek () in the result set ().
14. Use unsigned int to store IP addresses
Many programmers do not realize that they can use integer fields to store IP addresses. Therefore, they have been using VARCHAR (15) fields. Using INT requires only four bytes of space, and the field length is fixed.
Make sure that the column is of the unsinged int type, because the IP address may use every bit of the 32-bit unsigned integer data.
In the query, you can use INET_ATON () to convert an IP address to an integer and use INET_NTOA () to perform the opposite operation. Similar functions are available in PHP, ip2long () and long2ip ().
$r = "UPDATE users SET ip = INET_ATON('{$_SERVER['REMOTE_ADDR']}') WHERE user_id = $user_id";
15. tables with a fixed length (static) will be faster
(Note: the length of the table mentioned here actually refers to the length of the table header, that is, the space occupied by each data entry in the table, not the data volume of the table)
If all columns in the table are "fixed length", the table is considered as "static" or "fixed length. Unfixed column types include VARCHAR, TEXT, BLOB, and so on. Even if the table contains only one of these types of columns, the table is no longer of a fixed length, and the MySQL engine will process it in different ways.
A fixed-length table can improve performance because the MySQL engine can retrieve records faster. If you want to read a location in a table, it can calculate the location of this row directly. If the row size is not fixed, you need to search in the primary key.
They are also easy to cache and can be easily rebuilt after a crash. However, they also occupy more space. For example, if you change the character of a VARCHAR (20) to the CHAR (20) type, it will always occupy 20 bytes, No matter what content is stored in it.
You can use vertical partitioning technology to split columns with varying lengths into another table. Let's take a look:
16. vertical partitioning
Vertical partitioning is the vertical splitting of a table to optimize its structure.
Example 1: You may have a user table containing the home address, which is not a common data. In this case, you can split the table and save the address information to another table. In this way, your primary user table will be smaller. As you know, the smaller the table, the faster it is.
Example 2: There is a "last_login" field in the table. This field is updated every time you log on to the website, and each update will clear the query data cached in this table. In this case, you can place that field in another table to minimize the amount of updates to the user table.
However, you also need to make sure that you do not frequently join queries to separate the two tables. Otherwise, you have to endure the performance degradation caused by this.
17. Split large DELETE or INSERT statements
If you need to execute a large DELETE or INSERT query on the website, be sure not to affect network traffic. When executing a large statement, it locks the table and stops your Web application.
Apache runs many parallel processes/threads. Therefore, it is highly efficient to execute scripts. Therefore, the server does not want to open too many connections and processes, which consumes resources, especially memory.
If you lock a table for a long time (such as 30 seconds or longer), in a high-traffic website, it will lead to accumulation of processes and queries. It may take a long time to process these processes and queries, it even causes your website to crash.
If your maintenance script needs to delete a large number of rows, you only need to use the LIMIT clause to avoid blocking.
while (1) { mysql_query("DELETE FROM logs WHERE log_date <= '2009-10-01' LIMIT 10000"); if (mysql_affected_rows() == 0) { // done deleting break; } // you can even pause a bit usleep(50000);}
18. The smaller the column, the faster
For the database engine, disk space may be the most important bottleneck. In terms of performance, "small" and "tightening" can help reduce the volume of disk transfers.
The MySQL document contains a list of buckets required for various data types.
If the data table is expected to have only a small number of rows, there is no need to define the primary key as the INT type, which can be replaced by MEDIUMINT, SMALLINT, or even TINYINT. (Note: For DATE data,) If you do not need time, you should use DATE instead of DATETIME.
Make sure that you set aside reasonable data growth space, otherwise it may result in results like Slashdot (Translator's note: Slashdot changed the primary key of the comment table to INT type due to data growth, however, the corresponding data type in the parent table is not modified. Although an ALTER statement can solve the problem, it takes at least three hours to stop some services ).
19. Select the Correct storage engine
MySQL has two main storage engines: MyISAM and InnoDB, which have their own advantages and disadvantages.
MyISAM is applicable to applications with a large number of read requests, but not to scenarios with a large number of write requests. Even if you only want to update a field in a row, the entire table will be locked until the query is complete. No other process can read data from this table. MyISAM is very fast in calculating queries of the select count (*) type.
InnoDB is a complex storage engine, which is slower than MyISAM in most small applications. However, it supports row-level locks and has a better scale. It also supports some advanced features, such as transactions.
-
MyISAM storage engine
-
InnoDB Storage Engine
20. Use the Object link Er (ORM, Object Relational Mapper)
By using the ORM (Object-link CER), you can get a certain degree of performance improvement. Everything that can be done by ORM can also be done by manual encoding. However, this may mean that too much additional work is required and high-level professional knowledge is required.
ORM is known for its "delayed loading. This means that they obtain the actual values only when needed. But you need to be careful with them. Otherwise, you may eventually create many micro queries, which will reduce the database performance.
ORM can also process multiple query batches into transactions, which is much faster than sending a single query to the database.
Currently my favorite PHP-ORM is Doctrine. I wrote an article about how to install Doctrine and CodeIgniter (install Doctrine with CodeIgniter ).
21. Use persistent connection with caution
Persistent connection reduces the cost of re-connecting to MySQL. When a persistent connection is created, it remains open until the script is complete. Because Apache reuse its sub-process, the same MySQL connection will be reused when the next process runs a new script.
Theoretically looks good. However, from my personal experience (and many others), this feature may cause more trouble. You may have connection limit problems, memory problems, and so on.
Apache always runs in parallel and creates many sub-processes. This is the main reason why persistent connections cannot work well in this environment. Consult your system administrator before using mysql_pconnect.
Source: Burak Guzel Source: Open Source China