Two weeks has been to find ways to improve the speed of query, to achieve a little effect, solve some of the problems, write down so that the future to see for themselves.
Because the company does not have a dedicated DBA, I am not very familiar with the MySQL database, and the Java development of the network Audit system Management system, is after the N more than a few years of repair modification, today to our hands, to be able to support the large flow of the version, So it's really a problem for a few people in our Java group.
This large flow of information is also mentioned in previous articles, is to support the processing of 1G packets per second, the HTTP protocol has the most packets, so the HTTP protocol analysis module of the Flow log table records the largest, it is estimated to reach 40 million records a day, using a table, it is also very large, I saw it. MyD file size, is already 8G many.
And we manage the system query log records, on several fields are conditional query, and there are several field length of 256, in the 8G such a large table query a string, if not found, that must be traced back to the tail, speed is too slow to bear. Customers also have several fields set the criteria together to query, so basically twenty or thirty minutes will not come out, system availability is very poor.
My approach is to test the main, while looking at Java code, through log4j and perf4j logs, look at the time each SQL statement used, look for performance bottlenecks, and then targeted for optimization.
The most effective optimization of the query, nature is indexed, the ID nature is the self, the primary key, this predecessor has done; from the where statement analysis, the Time field as a query condition, time is 8 bytes, and do not repeat, set the index more appropriate. I set the time to index, have a little effect, but not a little, estimated: 8 * 4000 0000 = 320 000 000 bytes, 40 million records of the table only time a field index will be 320M, which is only a table of our hundreds of tables (customer requirements We keep at least 3 months record).
Indexing can play a role, but it does not solve our problems. Physical table building can not shorten the time, because one day a table, 3 months on the 91~92 table, 30 protocol modules will be more than 2,700, this is only a protocol flow log table, there are other tables.
Also can not put the customer requirements of the fields are set to the index, the index table and the original table is almost large, the index will lose meaning. In the database itself optimization, want to come to think of a good way, feel a large amount of data, even in Oracle there is no magic way.
I finally adopted the method of segmented query, is 40 million data, I do not care what you set the conditions to query, I was evenly divided into n paragraphs to query, such as 4 million for a paragraph, on the page to provide a drop-down list: 0~400 million, 400~800 million, ..., 3600~4000 million, Although the query is a bit more cumbersome, but the speed of each query is greatly improved, control in 30 seconds or so, sacrificing some usability, the total than 30 minutes still find out well.
Flow log can be used to solve the segmented query, but the customer requirements of all kinds of statistics, this can not say subsection statistics, others to statistics for 2 days, you are not able to separate.
Has previously used a pre-statistics, in a timely manner in the background of the log table statistics once, to save to the preliminary tables, and other users to query, from the pre-tables to carry out various inquiries-this practice is good, had to boast the former developers.
But now the situation is different, because the pre-statistical table is a one-month, the current flow of the scale of the log, that the preliminary statistics may be a table more than 40 million, specifically looking at the distribution of customer network data, not good estimate.
Finally, I and colleagues on the statistical model of detailed analysis, a colleague put forward in the preliminary statistics on the basis of two times, we estimate a bit, basically, such as users to query, the face of the table has been very small, up to thousands of records, very soon.
Solve the statistical query process, let me experience the detailed analysis of business process details, make corresponding optimization, sometimes can solve the problem.
On the whole, database query optimization, we have taken some conventional optimization, if you have not achieved the desired effect, we sometimes do not have to try to optimize the query itself, change the use of mode, look for business processes whether there are modifiable, perhaps easy to solve the existing problems.
There is the Director of the whole development team to mobilize the enthusiasm, we test, analysis, find ways, verify, and finally unanimously determine a workable plan, and then we separate to do not discount the realization.
Source: liangbing's blog