Optimize hundreds of millions of data queries in 10 minutes and query hundreds of millions of data records
A user developed QQ to contact me a few days ago, as shown below:
Free dolphins. 16:12:01
Island Lord, I cannot find the result of one of my SQL statements. Can you check it for me?
Orchid island master 16:12:10
How long will it take?
Free dolphin 16:12:17
No results are returned for a long time.
Orchid island master 16:12:26
Haha, good.
Orchid island master 16:12:39
Send SQL statements and execution plans.
Free dolphin 16:12:55
Select n. c1, n. c2, n. c3, n. c4, n. c5
From (select count (t. c1), t. c1, t. c2, t. c3, t. c4, t. c5
From tab1 t
Where t. c2 not in ('val1', 'val2', 'val3', 'val4', 'val5 ')
Group by t. c1, t. c2, t. c3, t. c4, t. c5) n
Where not exists
(Select * from (
Select count (s. c2), s. c1, s. c2
From (select m. c1, m. c2, m. c3, m. c4, m. c5
From tab1 m
Where exists (select c1
From tab2 n
Where c2> sysdate-14
And m. c1 = n. c1)
And m. c1 is not null
And m. c2 not in ('val1', 'val2', 'val3', 'val4', 'val5') s
Group by s. c1, s. c2) t1 where t1.c2 = n. c2)
And n. c1 is not null;
Orchid island master 16:13:12
Are these two tables big?
Free dolphin 16:13:16
Tab1 is small, with more than tabps and tens of millions of data in two weeks.
Lanhua island master 16:13:22
OK.
Orchid island master 16:16:29
So change the SQL statement:
With t1 (
Select count (t. c1), t. c1, t. c2, t. c3, t. c4, t. c5
From tab1 t
Where t. c2 not in ('val1', 'val2', 'val3', 'val4', 'val5 ')
And c1 is not null
Group by t. c1, t. c2, t. c3, t. c4, t. c5)
Select t1.c1, t1.c2, t1.c3, t1.c4, t1.c5
From t1
Where not exists (
Select/* + use_hash (m, n) */m. c1, m. c2, m. c3, m. c4, m. c5
From t1 m, tab2 n
Where n. c2> sysdate-14
And m. c1 = n. c1
And t1.c2 = m. c2 );
Orchid island master 16:16:43
Remove the execution plan.
Free dolphin 16:16:57
Okay.
Free dolphin 16:17:25
Orchid island master 16:17:57
Okay. Try it.
Free dolphin 16:19:28
The result is 37 s.
Orchid island master 17:20:21
Well, good.
Orchid island master 17:20:34
Can this happen?
Free dolphin 17:20:47
Yes.
Orchid island master 17:21:11
Well, well, let's do it first and stop calling it.
Free dolphin 17:21:30
Thank you, island master.
Orchid island master 17:21:53
You're welcome. You have to contact me.
Free dolphin 17:22:18
Well, you are busy...
At this point, the optimization of the user's SQL has ended. In fact, there should be room for optimization of this statement, but the user can say it is okay, because the optimization is endless and, further Optimization may require further information and sometimes requires greater changes. In view of various factors, we have processed the statements and plans and recorded them here!
Query and optimization of hundreds of millions of data.
We recommend that you use an oracle database to partition tables according to regular rules. The conditions allow you to store different tables on different disks and increase the read/write speed.
How can I design and optimize a mysql table that has reached 0.1 billion levels?
Single Table 0.1 billion? Or full database 0.1 billion?
1. First, you can consider business optimization, that is, vertical table sharding.
Vertical table sharding refers to dividing a table with a large amount of data into multiple tables based on the attributes of a field or the frequency of use.
If there are multiple business types, different tables are input for each business type, such as table1, table2, and table3.
If you do not need to use all the data in your daily business, you can split the table by time, such as the monthly table. Each table has only one month of records.
2. Architecture Optimization, that is, horizontal table sharding.
A horizontal table shard stores data rows in multiple independent tables based on the values of one or more columns of data. This table does not have any business significance.
For example, if the table is sharded by id, the data with 0-9 at the end is inserted into 10 tables respectively.
Maybe you have to ask, it looks like it is no different from the vertical table sharding just mentioned. However, whether the difference is of service significance is to split tables by field values.
In fact, the most popular Implementation of horizontal table sharding is achieved through horizontal database sharding. That is to say, the 10 tables just mentioned are distributed in 10 mysql databases. In this way, you can integrate multiple low-configuration hosts to achieve high performance.
The most common solution is cobar. This post provides a comprehensive introduction.
Blog.csdn.net/shagoo/article/details/8191346
The logical hierarchy of cobar:
However, this database sharding method also has some limitations and requires the application to cooperate accordingly. For example, in the case of database sharding, cross-database queries can be implemented, but related group by calculations cannot be performed.
In addition, we can also use table partitions to implement horizontal table sharding.
There are many ways to optimize mysql. The primary consideration is the actual situation of the individual. If the code is uncontrollable, it is not suitable for table sharding by field attribute, this may cause a lot of reconstruction and many unpredictable risks.
While architecture optimization is transparent to applications, it has many limitations on SQL writing, such as the inability to use aggregate functions and the need for adequate hardware resources, it makes no sense to have only one server.
In comparison, the lowest cost is to split tables or partitions by time. The two methods are transparent to applications.
Only one local data migration operation is required for a partition.
The only cost of separating the current network data from historical data through table sharding is regular data maintenance.
Generally, if the table contains 0.1 billion million data records, the indexing problem should be common sense. I will not talk about this.
In addition, if you think that the answer is good, give more points.