Problem SQL Background: if the project has six tables, the corresponding brand_id field should be written according to the pid field. However, there are two tables of tens of millions. After my worker is running, the online mysql master-slave synchronization is immediately delayed! After running for more than an hour, the latency was 40 minutes, and only tens of thousands of rows of data were updated. The SQL is as follows:
<! -- Update the brand id based on the product id --> <update id = "updateBrandIdByPid" parameterClass = "com. jd. chat. worker. domain. param. updateBrandIdParam "> UPDATE $ tableName $ SET brand_id = # newBrandId # WHERE pid = # pid # AND brand_id = 0 </update>
The mysql expert of the project team helped me analyze the problem. Because the pid field has no index, the mysql engine needs to scan the columns with the same pid value as the input one row by row and then update the data, that is, the SQL statement can be executed only after more than 1000 rows of disk data are scanned. What's more serious is the number of different PIDs in a ten-million-level table, and the number of such SQL statements to be executed.
My colleague suggested vertical table sharding Based on the id field in the SQL code hierarchy. Each time you update 1000 rows of data, the mysql engine does not need to scan the entire table every time, and the database pressure is one thousandth of the previous. As the primary key, id is indexed and indexed, which can greatly optimize the query performance. The optimized SQL statement is as follows:
<! -- Update the brand id based on the product id --> <update id = "updateBrandIdByPid" parameterClass = "com. jd. chat. worker. domain. param. updateBrandIdParam "> UPDATE $ tableName $ SET brand_id = # newBrandId # WHERE pid = # pid # AND brand_id = 0 AND id BETWEEN # startNum # AND # endNum # </update>
Only use the id range statement to vertically cut the code layers of a large table of tens of millions. After the worker is re-launched, there is no latency between the mysql master and slave! After monitoring, tens of thousands of data records were updated in just 10 minutes, which is six times more efficient than before! More importantly, database load balancing allows applications to run properly.