MySQL Optimization 21 experience

Source: Internet
Author: User
Tags compact mysql manual mysql query sql injection sql injection attack stmt vars phpmyadmin

Today, database operations are increasingly becoming a performance bottleneck for the entire application, which is especially noticeable for web applications. It's not just about the performance of the database that DBAs need to worry about, it's something that our programmers need to focus on. When we design the database table structure, we need to pay attention to the performance of the data operation when we operate the database, especially the SQL statements when we look at the table. Here, we're not going to talk too much about optimizations for SQL statements, but only for MySQL, the most Web application database. Hopefully the following optimization tips are useful for you.

1. Optimize your query for query caching

Most MySQL servers have query caching turned on. This is one of the most effective ways to improve sex, and this is handled by the MySQL database engine. When many of the same queries are executed multiple times, the results of these queries are placed in a cache so that subsequent identical queries do not have to manipulate the table directly to access the cached results.

The main problem here is that this is a very easy thing to ignore for programmers. Because, some of our query statements will let MySQL not use the cache. Take a look at the following example:

    1. $r = mysql_query ("Select username from user WHERE signup_date >= curdate ()");
    2. Turn on query caching
    3. $today = Date ("y-m-d");
    4. $r = mysql_query ("Select username from user WHERE signup_date >= ' $today '");


The difference between the two SQL statements above is curdate (), and the MySQL query cache does not work for this function. Therefore, SQL functions such as now () and RAND () or whatever, do not turn on the query cache because the return of these functions is variable. So all you need to do is use a variable instead of the MySQL function to turn on the cache.

2. EXPLAIN your SELECT query

Use the EXPLAIN keyword to let you know how MySQL handles your SQL statements. This can help you analyze the performance bottlenecks of your query statement or table structure.

EXPLAIN's query results will also tell you how your index primary key is being leveraged, how your data tables are searched and sorted ... Wait, wait.

Pick one of your SELECT statements (it is recommended to pick one of the most complex, multi-table joins) and add the keyword explain to the front. You can use phpMyAdmin to do this. Then, you'll see a table. In the following example, we forget to add the group_id index and have a table join:

When we index the group_id field:

As we can see, the previous result shows a search of 7883 rows, and the second one searches only 9 and 16 rows of two tables. Looking at the rows column allows us to find potential performance issues.

3. Use LIMIT 1 when only one row of data is used

When you query a table, you already know that the result will only have one result, but because you might need to fetch the cursor, or you might want to check the number of records returned.

In this case, adding LIMIT 1 can increase performance. This way, the MySQL database engine stops searching after it finds a piece of data, instead of continuing to look for the next record-compliant data.

The following example, just to find out if there are users of "China", it is obvious that the latter will be more efficient than the previous one. (Note that the first one is select *, and the second is select 1)

  1. $r = mysql_query ("SELECT * from user WHERE country = ' China '");
  2. if (mysql_num_rows ($r) > 0) {
  3. // ...  
  4. }
  5. To be efficient:
  6. $r = mysql_query ("Select 1 from user WHERE country = ' China ' LIMIT 1");
  7. if (mysql_num_rows ($r) > 0) {
  8. // ...  
  9. }

4. Jianjian Index for search words

The index does not necessarily give the primary key or the unique field. If you have a field in your table that you will always use to do a search, then index it.

From you can see that search string "last_name like ' a% '", one is built index, one is no index, performance is about 4 times times worse.

In addition, you should also need to know what kind of search is not able to use the normal index. For example, when you need to search for a word in a large article, such as: "WHERE post_content like '%apple% '", the index may be meaningless. You may need to use a MySQL full-text index or make an index yourself (say, search for keywords or tags, etc.)

5. Use a fairly typed example in the Join table and index it

If your application has many join queries, you should confirm that the fields of join in two tables are indexed. In this way, MySQL internally initiates the mechanism for you to optimize the SQL statement for join.

Also, the fields that are used for join should be of the same type. For example, if you want to join a DECIMAL field with an INT field, MySQL cannot use its index. For those string types, you also need to have the same character set. (Two tables may not have the same character set)

    1. $r = mysql_query ("Select Company_Name from Users"
    2. Left JOIN companies on (users.state = companies.state)
    3. WHERE users.id = $user _id ");
    1. The two state fields should have been indexed and are of equal type, with the same character set
6. Never ORDER by RAND ()

Want to disrupt the data rows returned? Pick a random data? I don't know who invented this usage, but many novices like it. But you do not understand how horrible the performance problem is.

If you really want to disrupt the data rows that you return, there are n ways you can achieve this. This use only degrades the performance of your database exponentially. The problem here is that MySQL will have to execute the rand () function (which consumes CPU time), and this is done for each row of records to be recorded and then sorted. Even if you use limit 1 is useless (because to sort)

The following example randomly picks a record

    1. $r = mysql_query ("Select username from the user ORDER by RAND () LIMIT 1");
    2. This is going to be better:
    3. $r = mysql_query ("SELECT count (*) from user");
    4. $d = mysql_fetch_row ($r);
    5. $rand = Mt_rand (0,$d [0]-1);
    6. $r = mysql_query ("Select username from user LIMIT $rand, 1");



7. Avoid SELECT *

The more data you read from the database, the slower the query becomes. And, if your database server and Web server are two separate servers, this also increases the load on the network transport.

So, you should develop a good habit of taking whatever you need.

    1. $r  = mysql_query ( "Select * from user where user_id = 1");   
    2. $d  = mysql_fetch_assoc ( $r);   
    3. echo 
    4.    
    5. //  recommended   
    6. $r  = mysql_query (
    7. $d  = mysql_fetch_assoc ( $r);   
    8. echo  "welcome {$d [' username ']}";   

8. Always set an ID for each table

We should set an ID for each table in the database as its primary key, and the best is an int type (recommended to use unsigned), and set the automatically added Auto_increment flag.

Even if you have a field in the users table that has a primary key called "email", you don't have to make it a primary key. Use the VARCHAR type to degrade performance when the primary key is used. In addition, in your program, you should use the ID of the table to construct your data structure.

Also, under the MySQL data engine, there are some operations that need to use primary keys, in which case the performance and settings of the primary key become very important, such as clustering, partitioning ...

In this case, there is only one exception, which is the "foreign key" of the "association table", that is, the primary key of the table, which consists of the primary key of several other tables. We call this the "foreign key". For example: There is a "student table" has a student ID, there is a "curriculum" has a course ID, then, "Score table" is the "association table", which is associated with the student table and curriculum, in the score table, student ID and course ID is called "foreign key" it together to form a primary key.

9. Use ENUM instead of VARCHAR

The ENUM type is very fast and compact. In fact, it holds the TINYINT, but it appears as a string on its appearance. In this way, using this field to make a list of options becomes quite perfect.

If you have a field such as "gender", "Country", "nation", "state" or "department", you know that the values of these fields are limited and fixed, then you should use ENUM instead of VARCHAR.

MySQL also has a "suggestion" (see article tenth) to show you how to reorganize your table structure. When you have a VARCHAR field, this suggestion will tell you to change it to an ENUM type. With PROCEDURE analyse () you can get advice.

10. Obtaining recommendations from PROCEDURE analyse ()

PROCEDURE analyse () will let MySQL help you analyze your fields and their actual data, and will give you some useful advice. These suggestions will only become useful if there is actual data in the table, because it is necessary to have data as a basis for making some big decisions.

For example, if you create an INT field as your primary key, but there is not much data, then PROCEDURE analyse () suggests that you change the type of the field to Mediumint. Or you use a VARCHAR field, because there is not much data, you might get a suggestion that you change it to an ENUM. These suggestions are probably because the data is not enough, so the decision-making is not accurate.

In phpMyAdmin, you can view these suggestions by clicking "Propose table Structure" while viewing the table.

It is important to note that these recommendations only become accurate when the data in your table is getting more and more. Be sure to remember that you are the one who will make the final decision.

11. Use not NULL where possible

Unless you have a very special reason to use null values, you should always keep your fields not NULL. This may seem a bit controversial, please look down.

First, ask yourself how big the difference is between "Empty" and "null" (if it's int, that's 0 and null)? If you feel that there is no difference between them, then you should not use NULL. (Do you know?) In Oracle, NULL and Empty strings are the same! )

Do not assume that NULL does not require space, that it requires extra space, and that your program will be more complex when you compare it. Of course, this is not to say that you cannot use NULL, the reality is very complex, there will still be cases where you need to use a null value.

Here is an excerpt from MySQL's own documentation:

"NULL columns require additional space in the row to record whether their values is null. For MyISAM tables, each of the NULL column takes one bit extra, rounded up to the nearest byte. "

Prepared statements

Prepared statements is much like a stored procedure, a collection of SQL statements running in the background, and we can derive many benefits from using Prepared statements, whether it's a performance issue or a security issue.

Prepared statements can check some of the variables you've bound so that you can protect your program from "SQL injection" attacks. Of course, you can also manually check these variables, however, manual checks are prone to problems and are often forgotten by programmers. When we use some framework or ORM, this problem is better.

In terms of performance, this gives you a considerable performance advantage when the same query is used multiple times. You can define some parameters for these prepared statements, and MySQL will parse only once.

While the latest version of MySQL in the transmission prepared statements is using the binary situation, this makes the network transfer very efficient.

Of course, there are some cases where we need to avoid using prepared statements because it does not support query caching. But it's said that version 5.1 supports

Add: Why use prepared statements?

In the application, there are many good points to using prepared statements, including security and performance reasons.
1: Security
Prepared statements increases security through the separation of SQL logic from data, and the separation of SQL logic from data prevents common types of SQL injection attacks (SQL injection attack), in some special query When submitting data received from the client, it should be noted that this attention is necessary when using troublesome characters such as single quote, double quote, and backslash characters.
Prepared statements use is not very necessary, but the separation of data allows MySQL to automatically consider these characters so that they do not need to use any special features to be escaped.
2: Performance
First: Prepared statements only parse once, when your initial session Prepared statements, MySQL will check the syntax and prepare the statement to run when you execute the query multiple times, so that there is no additional burden if, when running query Many times (e.g. insert) This preprocessing has a lot of performance improvements
Second: That's what it says. He uses the binary Protocol Protocol to improve efficiency.
Third: Because in the stored procedure, some statement syntax can not use dynamic variables, (such as: SELECT Limit,alter statement) only with the Prepared statements to solve the problem.
Such as:
Set @stmt =concat (' ALTER TABLE weekstock add Week ', @weekname, ' int (4) ');
Prepare S1 from @stmt;
Execute S1;
deallocate prepare S1;
3: note
If prepared statement is newly created at the session level, if you close the session, it will automatically deallocates. The global prepared statement can also be used in the session, and if you use the new prepared statement in this stored procedure, he will not automatically deallocates when the stored procedure is finished. Therefore, in order to limit the new large number of prepared statements,mysql through the Max_prepared_stmt_count variable to control, when set to 0 o'clock, is limited to use prepared statements
The following syntax can be used in prepared statements: ALTER table, call, COMMIT, create INDEX, create TABLE, DELETE, do, drop INDEX, drop TABLE, INSERT, RENAME TABLE, REPLACE, SELECT, SET, UPDATE, and most SHOW statements. Exceptions are statements added in subsequent releases.

To use prepared statements in PHP, you can view its user manual: Mysqli extension or using the database abstraction layer, such as PDO.

  1. if ($stmt = $mysqli->prepare ("Select username from user WHERE state=?")) {  
  2. //Binding parameters
  3. $stmt->bind_param ("s", $state);
  4. //Execute
  5. $stmt->execute ();
  6. //Bind result
  7. $stmt->bind_result ($username);
  8. //Move cursor
  9. $stmt->fetch ();
  10. printf ("%s is from%s\n", $username, $state);
  11. $stmt->close ();
  12. }



13. Non-buffered queries

Normally, when you execute an SQL statement in your script, your program will stop there until the SQL statement is returned, and your program continues to execute. You can use unbuffered queries to change this behavior.

In this case, there is a very good description in the PHP Documentation: Mysql_unbuffered_query () function:

"Mysql_unbuffered_query () sends the SQL query query to MySQL without automatically fetching and buffering the result rows As mysql_query () does. This saves a considerable amount of memory with SQL queries that produce large result sets, and can start working on t He result set immediately after the first row had been retrieved as you don ' t had to wait until the complete SQL query ha s been performed. "

The above sentence translates to say that mysql_unbuffered_query () sends an SQL statement to MySQL instead of automatically fethch and caches the results like mysql_query (). This can save a lot of considerable memory, especially those that produce a lot of results, and you don't have to wait until all the results are returned, and you can start working on the query results as soon as the first row of data is returned.

However, there are some limitations. Because you either read all the lines, or you want to call Mysql_free_result () to clear the results before making the next query. Also, mysql_num_rows () or Mysql_data_seek () will not work. So, you need to think carefully about whether to use unbuffered queries.

14. Save the IP address as UNSIGNED INT

Many programmers create a VARCHAR (15) field to hold IP in the form of a string rather than a shaped IP. If you use plastic to store it, you only need 4 bytes, and you can have a fixed-length field. And, this will bring you the advantage of querying, especially when you need to use such a where condition: IP between Ip1 and IP2.

We must use unsigned INT because the IP address uses an entire 32-bit unsigned shaping.

Instead of your query, you can use Inet_aton () to turn a string IP into a shape, and use Inet_ntoa () to turn an integer into a string IP. In PHP, there are also functions such as Ip2long () and Long2ip ().

    1. $r = "UPDATE users SET IP = Inet_aton (' {$_server[' remote_addr ']} ') WHERE user_id = $user _id";
15. Fixed-length tables are faster

If all the fields in the table are fixed length, the entire table is considered "static" or "Fixed-length". For example, there are no fields of the following type in the table: Varchar,text,blob. As long as you include one of these fields, the table is not a fixed-length static table, so the MySQL engine will handle it in a different way.

Fixed-length tables can improve performance because MySQL searches faster because these fixed lengths are easy to calculate the offset of the next data, so the nature of reading will be fast. And if the field is not fixed, then every time you want to find the next one, you need the program to find the primary key.

Also, fixed-length tables are more likely to be cached and rebuilt. However, the only side effect is that a fixed-length field wastes some space, because the field is set to allocate so much space whether you use it or not.

Using the "vertical split" technique (see the next one), you can split your table into two that are fixed-length and one that is indefinite.

16. Vertical Segmentation

"Vertical Segmentation" is a method of turning a table in a database into several tables, which reduces the complexity of the table and the number of fields for optimization purposes. (Previously, in a bank project, saw a table with more than 100 fields, very scary)

Example One : One of the fields in the Users table is the home address, which is an optional field, and you do not need to read or rewrite this field frequently in addition to your personal information when working in a database. So, why not put him in another table? This will make your table better performance, we think is not, a lot of time, I for the user table, only the user ID, user name, password, user role, etc. will be used frequently. A smaller table will always have good performance.

Example Two : You have a field called "Last_login" that will be updated every time the user logs in. However, each update causes the table's query cache to be emptied. So, you can put this field in another table, so that you do not affect the user ID, user name, user role of the constant read, because the query cache will help you to add a lot of performance.

In addition, you need to note that these separated fields form the table, you do not regularly join them, otherwise, this performance will be worse than not split, and, it will be a drop of magnitude.

17. Splitting a large DELETE or INSERT statement

If you need to perform a large DELETE or INSERT query on an online website, you need to be very careful to avoid your actions to keep your entire site from stopping accordingly. Because these two operations will lock the table, the table is locked, the other operations are not in.

Apache will have a lot of child processes or threads. So, it works quite efficiently, and our servers don't want to have too many child processes, threads and database links, which is a huge amount of server resources, especially memory.

If you lock your watch for a period of time, say 30 seconds, for a site with a high level of access, the 30-second cumulative number of access processes/threads, database links, and open files may not only crash your Web service, but may also cause your entire server to hang up immediately.

So, if you have a big deal, you make sure you split it, using the LIMIT condition is a good way. Here is an example:

  1. while (1) {
  2. //Do only 1000 strips at a time
  3. mysql_query ("DELETE from logs WHERE log_date <= ' 2009-11-01 ' LIMIT 1000");
  4. if (mysql_affected_rows () = = 0) {
  5. //don't have to delete, quit!
  6. Break ;
  7. }
  8. //Take a break every time
  9. Usleep (50000);
  10. }

18. The smaller the column the faster

For most database engines, hard disk operations can be the most significant bottleneck. So it's very helpful to have your data compact, because it reduces access to the hard drive.

See MySQL documentation Storage Requirements View all data types.

If a table has only a few columns (for example, a dictionary table, a configuration table), then we have no reason to use INT to master the keys, using Mediumint, SMALLINT or smaller TINYINT will be more economical. If you don't need to record time, using date is much better than DATETIME.

Of course, you also need to leave enough space for expansion, otherwise, you do this later, you will die very difficult to see, see Slashdot example (November 06, 2009), a simple ALTER TABLE statement took 3 hours, because there are 16 million data.

19. Choose the right storage engine

There are two storage engines MyISAM and InnoDB in MySQL, each with a few pros and cons. Cool Shell before the article "Mysql:innodb or MyISAM?" Discussion and this matter.

MyISAM is suitable for applications that require a large number of queries, but it is not very good for a lot of write operations. Even if you just need to update a field, the entire table will be locked and other processes will be unable to manipulate the read process until the read operation is complete. In addition, MyISAM's calculations for SELECT COUNT (*) are extremely fast.

The InnoDB trend will be a very complex storage engine, and for some small applications it will be slower than MyISAM. He is it supports "row lock", so in the writing operation more time, will be more excellent. Also, he supports more advanced applications, such as: transactions.

Here's the MySQL manual.

    • target= "_blank" MyISAM Storage Engine
    • InnoDB Storage Engine
20. Using an Object-relational mapper (relational Mapper)

With ORM (Object relational Mapper), you can gain reliable performance gains. All the things an ORM can do, can be written manually. However, this requires a senior expert.

The most important thing about ORM is "Lazy Loading", that is to say, only when the need to take the value of the time to really do. But you also need to be careful about the side-effects of this mechanism, because this is likely to degrade performance by creating many, many small queries.

ORM can also package your SQL statements into a single transaction, which is much faster than executing them alone.

Currently, the personal favorite of PHP's ORM is: Doctrine.

21. Be careful with "permalink"

The purpose of the permanent link is to reduce the number of times the MySQL link is recreated. When a link is created, it will always be in a connected state, even if the database operation is finished. And since our Apache has started reusing its child processes-that is, the next HTTP request will reuse Apache's subprocess and reuse the same MySQL link.

    • PHP Manual: Mysql_pconnect ()

In theory, this sounds very good. But from personal experience (and most people), this function creates more trouble. Because, you only have limited number of links, memory problems, file handles, and so on.

And, Apache runs in an extremely parallel environment, creating a lot of processes. This is why this "permanent link" mechanism is not working well. Before you decide to use permanent link, you need to think about the architecture of your entire system.

Http://www.cnblogs.com/doubilaile/p/4863752.html

MySQL optimization 21 experience (RPM)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.