Mysql DataBase Query Optimization: analyzes business process details

Source: Internet
Author: User
I tried to speed up the query in the last two weeks and achieved some results. I solved some problems and wrote them down for future display. Since the company does not have a dedicated DBA, I am not very familiar with mysql databases, and the management system of the network audit system developed in JAVA has been revised and modified by more than N people for several years, we need to change it

I tried to speed up the query in the last two weeks and achieved some results. I solved some problems and wrote them down for future display. Since the company does not have a dedicated DBA, I am not very familiar with mysql databases, and the management system of the network audit system developed in JAVA has been revised and modified by more than N people for several years, we need to change it

I tried to speed up the query in the last two weeks and achieved some results. I solved some problems and wrote them down for future display.

Since the company does not have a dedicated DBA, I am not very familiar with mysql databases, and the management system of the network audit system developed in JAVA has been revised and modified by more than N people for several years, today, we have to change to a version that supports high-traffic scenarios, so it is indeed a problem for our JAVA group with only a few people.

This high-traffic situation is also mentioned in previous articles, that is, to support processing of 1 GB of network packets per second, the most HTTP packets, therefore, the HTTP protocol analysis module has the largest number of records in the flow log table. It is estimated that it may reach 40 million records per day. Using a table per day is also very large. I have read it. the MYD file size is more than 8 GB.

When we query the log records in the management system, we need to perform conditional queries on several fields, and several fields have a length of 256. In the 8g table, we need to query a string. If the string cannot be found, it must be retrieved from the beginning, and the speed is too slow. The customer also needs to set conditions for query in several fields. In this way, the system availability is very low because it is usually 20 or 30 minutes long.

The method I used is mainly for testing. I also read the JAVA code and use Log4j and Perf4j logs to view the time used by each SQL statement, find the performance bottleneck, and then optimize it in a targeted manner.

The most effective optimization of the query is the establishment of an index, and the ID is the auto-increment and primary key, which has already been done by the predecessors. From the where statement analysis, there are many time fields as the query conditions, the time is 8 bytes and does not overlap. It is suitable to set indexes. I set the time to index, which has a little effect, but it is not very good. Estimated: 8*4000 0000 = 320 000 bytes, 40 million the record of the table only takes a field index of 320 MB, which is only one table of our hundreds of tables (the customer asked us to keep at least three months of records ).

Index creation can play a certain role, but it still cannot solve our problem. Creating a physical table can no longer shorten the time, because one table is created every day, and 91 ~ 92 tables, more than 2700 for 30 protocol modules. This is only a protocol flow log table, and there are other tables.

Nor can we set all the fields required for condition-making as indexes, so the index table will be about the same as the original table, and the index will be meaningless. Optimization on the database itself, I can't think of a good way to think about it at once. I feel that there is a large amount of data, even if there is no magic way in Oracle.

I finally used the segmented query method, that is, 40 million pieces of data. No matter what conditions you set to query, I divide them into N segments on average for query. For example, 4 million is a segment, provide a drop-down list on the page: 0 ~ 4 million, 400 ~ 8 million ,..., 3600 ~ 40 million. Although the query is a little troublesome, the speed of each segment of the query is greatly improved. The query speed is controlled in about 30 seconds, and some availability is sacrificed. It cannot be found in 30 minutes.

Sequential log query can be used to solve the problem. However, if the customer requires various statistics, this cannot be a matter of segmentation statistics. If someone else wants to count for two days, you cannot separate them.

A pre-statistics has been used in the past, and the journal log table is collected in the background in advance and saved to the pre-statistics table, it is a good practice to perform various queries from the pre-Statistical table, so I have to boast the previous developers.

However, the current situation is different, because the pre-Statistical table is one month, and now the scale of the flow log table, the pre-Statistical table may be more than 40 million, it is difficult to estimate the distribution of customer network data.

Finally, my colleagues and I analyzed the statistical model in detail. One of my colleagues proposed to perform secondary pre-statistics on the basis of the pre-statistics table. We estimated that when the user queries the data, the table we are dealing with is very small, with a maximum of thousands of records, which is very fast.

In the process of solving the statistical query, I realized how to analyze the details of the business process in detail and make corresponding optimization. Sometimes the problem can be solved.

In general, we have adopted some regular Optimizations to optimize database queries. If we haven't achieved the desired results, sometimes we don't have to try to optimize the query itself, change the usage mode and check whether the business processing process can be modified. The existing problems may be easily solved.

In addition, the supervisor should mobilize the enthusiasm of the entire development team to test, analyze, find a solution, and verify it together, and finally determine a feasible solution, and then implement it without compromise.

Source: liangbing's blog

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.