First, let's briefly introduce the background of the project. This is an online examination and practice platform. The database uses MySQL and the table structure:
Question is a table that stores questions. The data volume is about 30 thousand. The answerresult table is a table that stores the user's answer results. After the table is split, the record of a single table is about 3 million-4 million.
Requirement: The exercise volume is generated based on the user's answer results. The priority of the question is: questions not answered> only wrong questions> wrong and right questions> only right questions.
In "wrong and right questions", weights are calculated based on the ratio of the number of errors to the correct number of times. For example, if a or 10 errors are made, the weights are calculated for 100 errors; b. Do the Task 10 times and 20 times. At this time, B is selected to give the user a high probability of practice.
NOTE: If there is no questionid record in the answerresult table, this indicates that this question has not been done.
Previously used methods:
Select question. Question ID, ifnull (0-correct times)/(correct times + error times), 1) as weight from question
Left join answerresult on answerresult. Question id = question. Question ID
Where user id = {userid}
Note:Ifnull (0-correct times)/(correct times + number of errors), 1)This function is divided into two parts,
Formula: (0-correct number of times)/(correct number of times + number of errors) to obtain the weight of the question. The range is [0,-1]. 0 indicates that only the wrong question is returned, -1 indicates only the right questions. Ifnull (value, 1) sets the weight of a question that has not been done to 1, and lists the questions according to the weight.
Because the answerresult table contains tables of up to 300 million and 400 million, when left join is used for left join, the product of dikar is too large, and answerresult is a table with frequent reads and writes, it is easy to cause this SQL statement to become slow query.
After the performance problem is put on the agenda, this SQL statement becomes an optimization point.
1. The function compute of ifnull () can be adjusted to redundant fields.
2. The dikar product of left join is too large. you can adjust it to redundancy or use inner join to increase the query speed.
3. You can adjust the question policy as needed. Different SQL statements are executed in different situations without being implemented in the same SQL statement.
The solution is adjusted based on the preceding three points. Although the question table contains 30 thousand pieces of data, the topic scenario is actually based on the knowledge point, and a single knowledge point has only about 1000 questions at most. Therefore, when obtaining questions that have not been done, you can use the not in route to complete the process. The SQL statement is as follows:
A: Select question ID from question where knowledge point = {knowledgepointcode} and question id not in (
Select question ID from answerresult inner join question and question. Knowledge Point = {knowledgepointcode}
Where answerresult. User ID = {userid}
)
It is easy to exercise for the wrong question only (the correct number of times = 0 indicates that only the wrong question is done). The SQL statement is as follows:
B: Select question ID from answerresult inner join question and question. Knowledge Point = {knowledgepointcode}
Where answerresult. User ID = {userid} and correct COUNT = 0 order by error count DESC
If you want to question the wrong, right, or only right question, SQL is like this (the weight has been redundant =Ifnull (0-correct times)/(correct times + number of errors), 1)):
C: Select question ID from answerresult inner join question and question. Knowledge Point = {knowledgepointcode}
Where answerresult. User ID = {userid} and correct times> 0 order by weight DESC
Insufficient: the query speed of SQL statement A is still slow. Although the result set of not in is reduced, there are still some optimizations here. Can my friends in the garden give me some suggestions?
Some people say that join is the performance killer of SQL. I think it is mainly about how to use join. MySQL INDEX OPTIMIZATION is very important. If join becomes a performance bottleneck, you can explain whether the index is not properly created, and try to make dikar's product as small as possible. Using redundant data to avoid join. Updating redundant data is a headache when the changed redundant data is partitioned. High concurrency of massive data is indeed a headache.
Those who have experience in this area will not be enlightened. Thank you.