Background
"What are you doing here?" ”
"What's the matter?" I have the code unit test, no problem ah! "I was so confused that I ran to the big ops guy.
"Look at you!" Look! How many alarms, quickly optimize! ”
OPS big guy SMS list inside more than 50 MySQL CPU 100% alarm SMS. Look at the project name is not the project I just released a few days ago!?
I sank in the bottom of my heart, quickly compensate for the smiling face. "This must be optimized, immediately optimized!" Well, can you look at the database monitoring log ... "
OPS big guy again grumble me a few words, then transferred the database monitoring log.
The guy ... More than 300 connections per second, nearly capped full-table scans, and Big red CPU alerts ...
"That, can not look at Nginx access log ... I look at the volume of visits ... "I said weakly and weakly.
OPS, the big guy reluctantly ran down the following statement:
access.log
Come this interface is one of the interfaces with a larger request volume, the result is more than 6 million. At that time, at noon, the weekend peak is estimated to be tens of millions of days,
I left the pie mouth, thinking of such a high demand, the original so stingy only give me a low database also shy said, but the mouth is certainly: "Good, the request is not very large, it seems to be a database problem, I immediately to optimize!" ”
"To get it a read-write separation is not OK!?" "At this time another ops big guy came over and waved his hand freely ...
You asked me where the DBA went? The DBA was a little busy at the time, just saying let me check it myself ...
Optimization ideas
I this project due to online before the comparison, so there is no control of the database design problems, now with the game access, the volume of requests to be exposed. (In fact, early overtime and tired of bother to engage in)
This problem, do not need to increase the database hardware configuration and increase the read and write separation of this high-end means can be solved, I dug a lot of holes, the heart is still a little green tree.
The detailed MySQL optimization steps are as follows:
- Check data table structure to improve imperfect design
- Run through the main business, collect common database query SQL
- Query SQL analysis, appropriate splitting, adding indexes and other optimized queries
- Optimize your code logic while optimizing SQL
- Add local cache and Redis cache
This project is written by native PHP, these can only be done on their own.
Check the data table structure
Because the comparison of vegetables, go back to see the design of the table structure, really miserable.
Do not use null values whenever possible
Because when you create a table, if you do not set a default value for the created value, MySQL will set the default to NULL . So why NULL not use it?
NULLMaking index maintenance more complex, it is strongly recommended to set the index columnNOT NULL
NOT IN, != such as negative condition query NULL returns always null result with value, query error prone
NULLcolumn requires an extra byte as NULL the flag bit to determine whether the
NULLwhen used and the other values of the column may not be the same type, causing the problem. (behaves differently in different languages)
- MySQL is difficult to optimize
NULL queries for columns that can be used
So for those previously lazy fields, manually set a default value bar, empty string ah, 0 ah fill up.
While this approach does not improve the performance of MySQL, it is a good habit, and sprat, do not overlook these details.
Add index
For fields that are queried frequently, index-indexed and queries with no indexes are 10 times times faster or more.
- In general, each table needs to have a primary key
id field
- Fields that are commonly used for queries should be indexed
varcharType of field, it is best to specify the length when indexing
- When a query has more than one condition, the condition with indexes is preferred
- A
LIKE fuzzy search such as a condition is not valid for a field index, and a keyword index is required to resolve
- try not to constrain the relationship between tables and tables at the database level , and the dependencies between these tables should be addressed at the code level
When there is a constraint between the table and the table, although adding and deleting the SQL statement is simpler, but the negative effect is the insertion and other operational database will check the constraints (although you can manually set the Ignore constraints), this is equivalent to some business logic written to the database layer, inconvenient for maintenance.
Optimize table field Structure
The data in the database that can be represented in the shape should not use the string type, whether varchar or not char to see the possible values of the field.
This optimization is often not feasible after a large amount of data in the database, preferably in the design before the database design.
- For columns that may be of limited value, use
tinyint instead VARCHAR ,
- For example, to record a mobile device platform, only two values: Android,ios, then you can use 0 to indicate that android,1 is iOS, this column must write good comments
- Why not
ENUM ? ENUMexpansion difficulties, such as the later mobile platform added another ipad , it is not crazy, and tinyint add a 2 on the line, and ENUM in the code is particularly strange to deal with, is as plastic or string, the language is different.
- In this way, be sure to specify the meaning of each value in the database comment or code
- For those fixed-length strings, can be used
char , such as ZIP code, always 5-bit
- For strings that are unknown in length, use the
varchar
- Do not abuse
bigint , such as record the number of the Table id field, use int on the line, 2.1 billion articles upper limit enough
- Properly break the database paradigm add redundant fields to avoid table joins at query time
When querying, it is certain that the int type is varchar faster, because the comparison of integers can be implemented directly by invoking the underlying operator, while string comparisons are to be compared by character.
Fixed-length data is faster than variable-length data queries because the offset between the fixed-length data and the data is constant, and it is easy to calculate the offset of the next data. Variable-length data also requires an extra step to query the offset of the next data. But. Fixed-length data may waste more storage space.
Large table Split
For those whose data volume may be more recent than 500W or the rapid growth of the table, be sure to advance the vertical table or horizontal table, when the amount of data more than million, the query speed will be significantly reduced.
The Library sub-table will try to finalize the plan at the beginning of the database design, otherwise the code complexity is greatly increased and not easily changed later.
The vertical table is divided by the date and other external variables, the horizontal table is based on some field relationships in the table, using hash mapping sub-table.
The premise of a sub-database is that before executing a query, you already know which sub-library and which sub-table the data you need to query may fall into.
Refine query statements
This is the initiator of many system database bottlenecks.
- Please try to use simple queries to avoid using table links
- Try to avoid a full table scan, which will cause all table scan statements to include but not be limited to:
- WHERE clause condition is true or empty
- Use
LIKE
- Use the inequality operator (<>,! =)
- Query
is null the columns that contain
- Use on non-indexed columns
or
- For a multi-conditional query, place a simple query condition or an indexed column query in front
- Try to specify the columns you want to query, and don't be lazy with select *
- If not specified, on the one hand will return redundant data, occupy the broadband and so on
- On the other hand, when MySQL executes a query, there are no fields to query the table structure.
- Uppercase query keywords are a little faster than lowercase
- Using subqueries creates temporary tables that are slightly slower than link (JOIN) and Union (union)
- Query on Index field try not to use database functions, not easy to cache query results
- When only one row of data, use limit 1, if there is too much data, please set limit, paging query
- Never ORDER by RAND (), very low performance
Here are some of the tips I summarized, these rules are dead, but the business scenario is live, in the actual use of the process, such as data statistics, can be appropriately sacrificed performance in exchange for convenience.
Add cache
Using a cache such as Redis, as well as a local file cache, can greatly reduce the number of database queries. Cache this thing, be sure to analyze your system's data characteristics, appropriate choice.
- For some commonly used data, such as configuration information, can be placed in the cache
- The table structure of the database can be cached locally
- Cached data must be updated in a timely manner, and set the validity period
- Increasing the cache must increase system complexity, and be sure to weigh
Optimizing instances
Here are a few examples of simple optimization queries. The first is to run the main business, to print the main query statements into a file, and then analyze the statements.
To add, explain you can see the specifics of the query execution by using keywords before querying the statement.
Look at this query statement below.
SELECT *from link where player_id= 15298635 ' and gameid= 10389 ' and appid= ' 200 ' span class= "kw" >and action= ' open ' and creator= ' android_sdk ' and transport=
There's a lot of trouble with this statement.
- SELECT * does not specify a query column, this table has 20 fields, actually I use a few
- Query columns are not indexed, resulting in full table scan
- Query conditions are too redundant to be properly split
- Only one query result is required, but the query result size is not qualified
Obviously the query conditions are many, and many columns are variable varchar type, if you want to build an index, do you want to establish a federated index?
Obviously not necessary, the more fields indexed, the more complex MySQL maintenance, the loss of performance, such as SQL query statements, we build the index on the primary field. For example, player_id Building indexes on fields, fields gameid , and fields appid is enough.
Such a query statement to be combined with specific business scenarios for analysis, such as in my current system, I would like to expect the above statement can query the same parameters whether there are records. There is actually no need to use so many criteria queries.
I just need to use this simpler query instead.
select id,player_idfrom link where player_id=‘15298635‘
The number of records to be queried below 100, most of them only dozens of records, I can in the code in the query results will be traversed once to judge. I don't know how fast it is!
Let's look at the following example:
select * from browser where device_id=‘52‘ AND created>=‘1513735322‘ order by id desc
I just want to check the data for a certain time in the table. It's a big problem!
createdThe field is a timestamp type, so it is wrong to use, and there is no limit to the number of rows, this statement will be all the database device_id= ' 52 ' data out.
Fortunately device_id , the field is indexed, otherwise it will inevitably result in a full table scan.
The revised query is as follows:
select *from browserwhere device_id=‘52‘ AND created>=‘2018-03-27 00:00:00‘ order by id desc
My system does not always use complex tables such as joins and unions such queries, such queries must be careful to use, can split the words as far as possible to split.
Remember the following speed priority, 22 difference between 2 orders of magnitude
CPU Run speed > Memory access speed > Disk IO access speed > Network request speed
MySQL Optimization tips