Today, a colleague told me that an SQL statement cannot be executed for a long time. How long does it take? She said it was about a day. This is a staggering SQL statement. So I came to SQL and looked at the execution plan.
Is the execution plan:
25G cost and 75 t bytes are really unbearable. This SQL statement is like this:
The Select part performs many sum operations and distinct operations. In short, the group by part is the preceding dimension. The largest tables are table3 and table4. The two tables need to query more than 3 GB of data, with each table having about 30 million data.
At first, I thought this execution plan was not feasible because of the large data volume. However, after I compress the corresponding data of table3 and table4, the data volume is reduced to about 1 GB, however, the execution plan is basically not changed. This is not the effect I want, so I noticed the section in the red box of the execution plan. Is this a problem? So I started to check the SQL and found a problem: table4 actually only has 201302 of the data, but why do we still need to write the month ID during the left join? This is unreasonable, based on my past experience, when I connect left or right, if the and condition is too many, the execution plan is often affected, resulting in SQL failure to get results for a long time. So I did a very simple task, that is to remove the month_id part of table4. Then I took another step and removed the month part of table3. This is a partition table, so I used this method:
Left join table3 partition (part_02). This reduces the and condition and does not affect data accuracy. After optimization, the execution plan becomes like this:
We can see that the execution plan has undergone earth-shaking changes. What we can see is that the nested loops outer is missing and replaced by the hash join outer of part 1 and part 2 in the figure and the hash join outer of table 1, this is what I like to see. I like to see simple execution plans. Although the cost is still very large, I really don't want to move it. The data size is too huge. The difficulty of optimization cannot make me think about it, study and think carefully, and then get the cost to a few thousand or a smaller degree. That's it. It's raining, Niang wants to marry, and let him go.
I have seen that I have been writing on the forum or in books. If your table is more than 1 GB, I 'd better partition it. Now I have a deep understanding. If there is no partition table, there is no way to optimize this SQL statement (or I cannot optimize it), because there is no way to remove the month condition unless the data of a specific month is obtained to create an intermediate table, however, it seems a little troublesome.Code.