Let's start with a brief introduction to the project background. This is an online test practice platform, database using MySQL, table structure as shown in the figure:
The question is the table that stores the problem, the amount of data is about 30,000. The Answerresult table is a table that stores the results of users ' replies, and a single table after the table is about 3 million-4 million.
Requirements: According to the user's answer results of the practice volume, the priority of the topic for: not done topics > only wrong topic > do wrong and do the right topic > only to do the right topic.
In the "Do wrong and do the right", will be the number of errors and the proportion of the correct number of weights calculated, such as: A, done 10 times, do 100 times, B, do a wrong 10 times, do to 20 times. The probability of B being selected for the user to practice is great.
Note: The QuestionID record does not exist in the Answerresult table, which means that the problem has not been done.
Previously used methods:
SELECT question. Topic identification, Ifnull ((0-correct number of times)/(correct times + errors), 1) as weight from question
Left JOIN Answerresult on answerresult. Topic identification = question. Topic identification
WHERE User ID ={userid}
Description:ifnull ((0-the correct number)/(correct times + error times), 1) This function is divided into 2 parts,
Formula: (0-correct number of times)/(correct number + error times) Get the weight of the topic, this interval is [0,-1],0 said only the wrong topic,-1 means to do the right topic. Ifnull (value,1) will be the weight of the title set to 1, according to the weight of the list of topics listed.
Since the Answerresult table is a table of up to 300, 400 million, the Cartesian product is too large and the answerresult is frequently read-write, so it can easily cause this SQL to become a slow query by leaving the left join.
After the performance problem is put on the schedule, the SQL statement becomes the optimization point.
1, ifnull () This function calculation can be adjusted into redundant fields.
2. The Cartesian product of the left join is too large to be adjusted to redundancy or use inner join to improve query speed.
3, according to the needs, can actually adjust the strategy, different situations to execute different SQL, and do not need to be implemented in the same SQL.
The solution is adjusted for the above three points. Although the question table has 30,000 data, but the scene is in fact for the topic of knowledge points, a single point of knowledge is only about 1000 questions, so get not done the topic, you can use not in Go index to complete. SQL statements such as:
The A:select topic identifies the from question WHERE knowledge point ={knowledgepointcode} and the topic identifies not in (
The SELECT topic identifies the from Answerresult INNER JOIN question and question. Knowledge Point ={knowledgepointcode}
WHERE answerresult. User ID = {USERID}
)
It's easy to do a question on a topic that is just wrong (the correct number = 0 means to do the wrong thing), SQL such as:
The B:select topic identifies the from Answerresult INNER JOIN question and question. Knowledge Point ={knowledgepointcode}
WHERE answerresult. User ID = {USERID} and correct number = 0 ORDER by error number DESC
To do the wrong thing, do the right thing or just do the right question, SQL is this (has been redundant = Ifnull (0-the correct number of times)/(correct times + errors) (1)):
The C:select topic identifies the from Answerresult INNER JOIN question and question. Knowledge Point ={knowledgepointcode}
WHERE answerresult. User ID = {UserId} and correct times > 0 ORDER by weight DESC
Insufficient: SQL statement A's query speed is still slow, although there is a result set to reduce not in, but here is the optimization point. Can friends in the garden not give me some advice?
Some people say that join is the SQL performance killer, I think the main still how to use Join,mysql Index optimization is very important, if join becomes a performance bottleneck, you can explain see if the index is not built, and try to make the Cartesian product as small as possible. Using redundant data to avoid joins, updating these redundant data is a big headache when the potentially changing redundant data is divided into tables. Massive data high concurrency, is really a very headache.
Hope that the garden has the experience of friends with the generous enlighten. Thank you.