MySql update optimization policy and mysql update policy
Simulation Scenario 1:
Add several fields to the structure adjustment of a table in the database and then refresh the previous data. The refreshed content matches an existing field url, then, update the newly added field type and typeid.
Later, I wrote a shell script to refresh the data. After running the shell script, I ran it. Why is it so slow ~~~
Scenario Reproduction
Copy codeThe Code is as follows:
Create table 'fuckspeed '(
'Uin' bigint (20) unsigned not null default 0,
'Id' int (11) unsigned not null default 0,
'Url' varchar (255) not null default '',
'Type' int (11) unsigned not null default 0,
'Typeid' varchar (64) not null default '',
......
KEY 'uin _ id' ('uin', 'id ')
) ENGINE = InnoDB default charset = utf8;
The table structure is like this (many fields are omitted). The table has only one joint index.Uin_idIn the update process, the following ideas are provided:
Obtain a certain amount of data based on an id range.
Copy codeThe Code is as follows:
Select id, url from funkSpeed where id >=101 and id <=200;
Traverse all data and update each data entry
# First, process the data and obtain matching information.Type and typeid
Copy codeThe Code is as follows:
Update fuckSpeed set type = [type], typeid = [typeid] where id = [id]
After proceeding with the above ideas, we found that the update was particularly slow, with an average of 3 to 3 seconds ~ About five, I was also drunk. I looked at the data to be updated, with a total of more than 32 million records. This would take about 24 h + to update, that is, more than a day, amount ~~ Cry. Think about what's wrong.
Problems Found
The first thing I think of is whether it is slow because only one process is being updated. I started five processes and segmented the IDs, as shown below:
Copy codeThe Code is as follows:
./Update_url.sh 0 10000 &
./Update_url.sh 10000 20001 &
./Update_url.sh 20001 30001 &
./Update_url.sh 30002 40002 &
./Update_url.sh 40003 50003 &
After the operation, we found that the speed was not improved much, or the speed was updated three to three times per second ~ Five or so, too. time cannot be spent on the steps before data is inserted (matching, assembling SQL statements ,...), It should be a problem during insertion.
Let's take a look at my SQL statements.
Copy codeThe Code is as follows:
Select id, url from funkSpeed where id >=101 and id <=200 ;,
Here, I tried to execute the command line, and the result is as follows:
Copy codeThe Code is as follows:
Mysql> select id, url from funkSpeed where id> = 0 and id <= 200;
Empty set (0.18 sec)
It took 0.18 seconds. At this time, I suddenly realized that I didn't use the Union index. The condition for the Union Index to take effect is that the field on the left must be available. I used the explain statement to verify that it was like this:
Copy codeThe Code is as follows:
Mysql> explain id, url from funkSpeed where id> = 0 and id <= 200;
+ ------------- + ------ + --------------- + ------ + --------- + ------ + -------- + ------------- +
| Table | type | possible_keys | key | key_len | ref | rows | Extra |
+ ------------- + ------ + --------------- + ------ + --------- + ------ + -------- + ------------- +
| FunkSpeed | ALL | NULL | 324746 | Using where |
+ ------------- + ------ + --------------- + ------ + --------- + ------ + -------- + ------------- +
1 row in set (0.00 sec)
Then use the Union Index:
Copy codeThe Code is as follows:
Mysql> select uin, id from funkSpeed where uin = 10023 and id = 162;
+ ------------ + ---------- +
| Uin | id |
+ ------------ + ---------- +
| 10023/162 |
+ ------------ + ---------- +
1 row in set (0.00 sec)
Mysql> explain select uin, id from funkSpeed where uin = 10023 and id = 162;
+ ------------- + ------ + --------------- + ---------- + --------- + ------------- + ------ + ------------- +
| Table | type | possible_keys | key | key_len | ref | rows | Extra |
+ ------------- + ------ + --------------- + ---------- + --------- + ------------- + ------ + ------------- +
| FunkSpeed | ref | uin_id | 12 | const, const | 4 | Using index |
+ ------------- + ------ + --------------- + ---------- + --------- + ------------- + ------ + ------------- +
1 row in set (0.00 sec)
We can see that it is almost a second query. At this time, we can basically conclude that the problem occurs at the index location.
The number of select operations is relatively small, and the id difference between each two select statements is 10000. Therefore, this can be ignored and cannot be optimized unless an index is added to the id.
The problem occurs in
Copy codeThe Code is as follows:
Update fuckSpeed set type = [type], typeid = [typeid] where id = [id]
The query is also used in the update process.MysqL version is5.5, NoExplain updateOtherwise, you will be able to verify what I said. update it here.32 w +Each piece of data is updated. Each piece of data is about seconds. This is too scary ~~
Solve the problem
Once the problem is found, it will be much easier to solve it ~~
SelectAdd a field uin, and change it to the following:
Copy codeThe Code is as follows:
Select uin, id, url from funkSpeed where id> = 101 and id <= 200;
And then use
Copy codeThe Code is as follows:
Update fuckSpeed set type = [type], typeid = [typeid] where uin = [uin] id = [id]
In this way, the index is used.
After the code has been changed, I tried to start a process to see how it worked. Indeed, the improvement was not a little bit, with an average of 30 + times/s, in this way, all the updates can be completed in about three hours.
Scenario 2:
You need to write the pid fields of the six tables to the corresponding brand_id field.
Problem SQL Background: if there are 6 tables in the project, the corresponding pid field should be written.Brand_idField. However, there are two tables of tens of millions. After my worker is runningMysqlThe master-slave synchronization is delayed immediately! After running for more than an hour, the latency was 40 minutes, and only tens of thousands of rows of data were updated. The SQL is as follows:
Copy codeThe Code is as follows:
<! -- Update the brand id based on the product id -->
<Update id = "updateBrandIdByPid" parameterClass = "com. jd. chat. worker. domain. param. UpdateBrandIdParam">
UPDATE $ tableName $
SET brand_id = # newBrandId #
WHERE pid = # pid #
AND brand_id = 0
</Update>
The mysql expert of the project team helped me analyze the problem. Because the pid field has no index, the mysql engine needs to scan the columns with the same pid value as the input one row by row and then update the data, that is, the SQL statement can be executed only after more than 1000 rows of disk data are scanned. What's more serious is the number of different PIDs in a ten-million-level table, and the number of such SQL statements to be executed.
My colleague suggested vertical table sharding Based on the id field in the SQL code hierarchy. Each time you update 1000 rows of data, the mysql engine does not need to scan the entire table every time, and the database pressure is one thousandth of the previous. As the primary key, id is indexed and indexed, which can greatly optimize the query performance. The optimized SQL statement is as follows:
Copy codeThe Code is as follows:
<! -- Update the brand id based on the product id -->
<Update id = "updateBrandIdByPid" parameterClass = "com. jd. chat. worker. domain. param. UpdateBrandIdParam">
UPDATE $ tableName $
SET brand_id = # newBrandId #
WHERE pid = # pid #
AND brand_id = 0
AND id BETWEEN # startNum # AND # endNum #
</Update>
Only use the id range statement to vertically cut the code layers of a large table of tens of millions. After the worker is re-launched, there is no latency between the mysql master and slave! After monitoring, tens of thousands of data records were updated in just 10 minutes, which is six times more efficient than before! More importantly, database load balancing allows applications to run properly.
The above two problem simulation scenarios are used to reproduce and analyze the MySql update optimization policies, hoping to help you with the database.