10 suggestions for optimizing MySQL statements

Source: Internet
Author: User

(Translator's note: The author uses this question to intercept another article of the same name)

Justin Silverton of Jaslabs lists ten statements related to optimizing MySQL queries. I have to comment on this because the list is very bad. Another Mike realized the same thing. So in this blog, I want to do two things. First, I want to point out why the list is bad. Second, I want to list my list. Let's continue. Fearless readers!

Why is that list so bad?

1. His strength does not make the right place

One principle we should follow is that if you want to optimize the code, you should first find out the bottleneck. However, Mr Silverton's strength is useless. I think 60% of optimization is based on a clear understanding of SQL and database basics. You need to know the difference between join and subquery, column indexes, and how to normalize data. In addition, the 35% Optimization requires that you know the performance of the database selection. For example, COUNT (*) may be very slow or very fast, depending on the database engine you choose. There are also some other factors to consider, such as when the database does not need to be cached, when there is a hard disk instead of memory, and when the database creates a temporary table. The remaining 5% is rarely met, but Mr. Silverton just spent a lot of time on it. I have never used SQL _SAMLL_RESULT.

2. Good problem, but bad solution

Mr. Silverton raised some good questions. MySQL uses the dynamic row format (dynamic row format) for variable-length columns such as TEXT or BLOB, which means the sorting will be performed on the hard disk. Our method is not to avoid these data types, but to split these data types from the original table and put them into another table. The following schema illustrates this idea:

 
 
  1. CREATE TABLE posts (  
  2.     id int UNSIGNED NOT NULL AUTO_INCREMENT,  
  3.     author_id int UNSIGNED NOT NULL,  
  4.     created timestamp NOT NULL,  
  5.     PRIMARY KEY(id)  
  6. );  
  7.    
  8. CREATE TABLE posts_data (  
  9.     post_id int UNSIGNED NOT NULL.  
  10.     body text,  
  11.     PRIMARY KEY(post_id)  
  12. ); 

3. A bit incredible ......

Many of his suggestions are surprising, such as "removing unnecessary parentheses ". It is not important to write SELECT * FROM posts WHERE (author_id = 5 AND published = 1) or SELECT * FROM posts WHERE author_id = 5 AND published = 1. Any better DBMS will automatically perform identification and processing. This kind of details is like whether the C language is I ++ faster or ++ I faster. Really, if you spend all your energy on it, you don't need to write code.

My list

Check whether my list is better. Let me start with the most common one.

1. Establish a benchmark, establish a benchmark, and establish a benchmark!

If we need to make a decision, we need to talk about data. What kind of query is the worst? What are the bottlenecks? Under what circumstances will I write a bad query? Benchmark Testing allows you to simulate high-pressure situations and use performance evaluation tools to detect Database Configuration errors. Such tools include supersmack, AB, and SysBench. These tools can directly test your database (such as supersmack), or simulate network traffic such as AB ).

2. Performance Testing, performance testing, and performance testing!

Then, when you can establish some high-pressure situations, you need to find out the configuration errors. This is what the performance evaluation tool can do for you. It can help you find bottlenecks in the configuration, whether in memory, CPU, network, hard disk I/O, or above.

The first thing you need to do is to enable the slow query log (slow query log) and install mtop. In this way, you can obtain information about malicious intruders. Does the query statement that needs to run for 10 seconds destroy your application? These guys will show you how their query statements are written.

After finding slow query statements, you need to use MySQL tools, such as EXPLAIN, show status, and show processlist. They will tell you where resources are consumed and where the query statement is defective. For example, if a query statement with three join subqueries is sorted in memory, or on the hard disk. Of course, you should also use evaluation tools such as top, procinfo, and vmstat to obtain more system performance information.

3. Reduce Your schema

You need to design a schema before writing a query statement. Remember, the space required to load a table into memory is roughly the size of the number of rows * a row. Unless you think that everyone in the world will register 2 MB and 800 billion times on your website, you do not need to use BITINT as your user_id. Similarly, if a text column is of a fixed size, such as the US zip code, usually in the form of a "XXXXX-XXXX"), using VARCHAR will add extra bytes to each line.

Some people disagree with database standardization. They say this will form a very complex schema. However, proper normalization will reduce redundant data. Proper normalization means sacrificing a little performance in exchange for less footprint as a whole, which is very common in computer science. The best method is IMO, that is, to normalize the data first, and then normalize the data if the performance is needed. Your database will be more logical and you do not need to optimize it too early. Note: I am not very familiar with this paragraph. I may have translated it incorrectly. Please correct it .)

4. Split your table

In general, some tables only have columns that you often need to update. For example, for a blog, you need to display the title in many different places, such as the list of recent articles), only display the summary or full text on a specific page. Horizontal and vertical split is very helpful:

 
 
  1. CREATE TABLE posts_tags (  
  2.     relation_id int UNSIGNED NOT NULL AUTO_INCREMENT,  
  3.     post_id int UNSIGNED NOT NULL,  
  4.     tag_id int UNSIGNED NOT NULL,  
  5.     PRIMARY KEY(relation_id),  
  6.     UNIQUE INDEX(post_id, tag_id)  
  7. ); 

The artificial key is completely redundant, and the number of post-tag relationships will be limited by the system maximum value of integer data.

 
 
  1. CREATE TABLE posts_tags (  
  2.     post_id int UNSIGNED NOT NULL,  
  3.     tag_id int UNSIGNED NOT NULL,  
  4.     PRIMARY KEY(post_id, tag_id)  
  5. ); 

6. Learning Indexes

The quality of the index you choose is very important. Otherwise, it may damage the database. For those who have not yet studied the database, the index can be considered as hash Sorting. For example, if we use the query statement SELECT * FROM users WHERE last_name = 'goldstein ', and last_name has no index, the DBMS will query each row to see if it is equal to "Goldstein ". The index is usually B-tree and there are other types), which can speed up the comparison.

You need to add an index to the select, group, order, and join columns. Obviously, the space required for each index is proportional to the number of rows in the table, so the more indexes, the more memory occupied. In addition, indexes also affect data writing because the corresponding indexes are updated each time data is written. You need to take a balance point, depending on the needs of each system and the implementation code.

7. SQL is not C

C is a classic process language. For a programmer, C is also a trap, which makes you mistakenly think that SQL is also a process language, of course, SQL is neither a functional language nor object-oriented ). Instead of operating on data, you need to imagine a group of data and the relationship between them. The usage of subqueries is often incorrect.

 
 
  1. SELECT a.id,  
  2.     (SELECT MAX(created)  
  3.     FROM posts  
  4.     WHERE author_id = a.id)  
  5. AS latest_post  
  6. FROM authors a 

Because this subquery is coupled, we should use join instead of external Query Information for subqueries.

 
 
  1. SELECT a.id, MAX(p.created) AS latest_post  
  2. FROM authors a  
  3. INNER JOIN posts p  
  4.     ON (a.id = p.author_id)  
  5. GROUP BY a.id 

8. Understand your engine

MySQL has two storage engines: MyISAM and InnoDB. They have their own performance characteristics and considerations. In general, MyISAM is suitable for reading a lot of data, while InnoDB is suitable for writing a lot of data, but in many cases it is the opposite. The biggest difference is how they process the COUNT function.

MyISAM caches table meta-data, such as the number of rows. This means that COUNT (*) does not consume much resources for a well-structured query. There is no such cache for InnoDB. For example, if you have a SELECT * FROM users LIMIT 5 or 10 Statement and run select count (*) FROM users LIMIT 5 or 10, for MyISAM to be completed quickly, and for InnoDB it takes the same time as the first statement. MySQL has the SQL _CALC_FOUND_ROWS option, which tells InnoDB to calculate the number of rows when running the query statement, and then obtain it from SELECT FOUND_ROWS. This is unique to MySQL. However, using InnoDB is sometimes necessary. You can obtain some functions, such as row locking and stord procedure ).

9. MySQL-specific shortcut keys

MySQL provides many extensions for ease of use. For example, INSERT... SELECT, INSERT... On duplicate key update and REPLACE.

I don't hesitate to use them because they are very convenient and can play a good role in many cases. However, MySQL also has some dangerous keywords and should be used less. For example, insert delayed indicates that MySQL does not need to INSERT data immediately (for example, when writing logs ). However, if the data volume is high, insertion may be delayed indefinitely, resulting in the insertion queue being full. You can also use the index prompt of MySQL to identify which indexes need to be used. MySQL runs well most of the time. However, if the schema design is poor or the statements are not well written, MySQL may perform poorly.

10. So far.

Finally, if you are concerned about MySQL performance optimization, read Peter Zaitsev's blog about MySQL performance. He wrote many blogs about database management and optimization.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.