On the understanding, I see a question like this:
MySQL Query select * FROM table where ID in (hundreds of or thousands of IDs) how to improve efficiency? Modify the e-commerce website, a product attribute table, hundreds of thousands of records, 80M, index only the primary key ID, do such a query how to improve efficiency?
SELECT * FROM table where ID in (hundreds of or thousands of IDs)
These IDs are not regular, scattered ....
Look at the answer, feel a lot of unreliable, but accreditations, so on my computer wrote a few query test. I use Postgresql9.4, but I feel that MySQL should also be similar, first create a simple table, only a simple 3 columns, under the question many people mentioned the need to look at the size of the table, in fact, the problem is not related to the size of the table, only the size of the index, because the index is built on the int, so only and The number of records concerned.
Table" Public. T9 "Column |Type|Modifiers--------+----------------+-----------C1| integer |C2| character( -)|C3| character( $)|Indexes: "I1"UNIQUE, Btree (C1)
INSERT into T9 values (Generate_series (1000,500000,1), repeat (' a ', ' n '), repeat (' B ', 180));
Then generate some random numbers on Mac with Jot,linux on Shuf
for ((i=0;i<100000; i++)) Do 1600000 >>rand. file Done
Then generate a query statement based on Rand.file:
Select * fromT9whereC1inch (494613,575087,363588,527650,251670,343456,426858,202886,254037,...1);
Generate 3 SQL files respectively, in the number of variables are 100,1000 and 10,000, execute these 3 SQL files, see the time
? Try Psql study-f Test_100.sql-o/dev/nullLOG: 2.879 ms? -F Test_1000.sql-o/dev/nullLOG :11.974 ms ? -F Test_10000.sql-o/dev/nullLOG :355.689 ms
You can see that the data time will vary greatly only when the data in the in is 10,000, but it is only done in more than 300 Ms.
So if you follow some of the answers, build a temporary table, then use in subquery, and hopefully you can join two tables at this time? For the sake of simplicity, I directly join with two tables.
Drop Tablet_tmp;Create TableT_tmp (IDint);Insert intoT_tmp (ID)Values(494613),(575087),(363588),(345980),
... (1)
;SelectT9.* fromT9, T_tmpwhereT9.c1=T_tmp.id;
How about the time?
- - /Dev/nullLOG: 2.078 mslog: 1.233 mslog: 224.112 mslog: 322.108 Ms
Except for the drop and create time, still spent a lot of time, the premise is still my SSD disk, so write log time will be much faster. Why is it so slow? Take a look at the explain, this time the data volume is large, directly go to merge Join
What about the efficiency of the 1000 rows of data?
- - Exp . out log: 2.476 mslog: 0.967 mslog: 2.391 msLOG: 8.780 ms
100 rows of data are as follows:
? Try Psql study-f Test_create_100.sql-o/dev/nullLOG: 2.020 mslog: 1.028 mslog: 1.074 mslog: 1.912 ms
You can see that the way to create a table with 100 values and 1000 values is not much better than writing all the variables directly in the in, explain is using NLJ. But in the case of a larger amount of data (according to the original problem, where the number in fact is not predictable) the efficiency will be lower, coupled with additional table maintenance costs and redundant SQL statements, the DBA certainly do not like, or trust the database, be assured that the bold directly in the list to solve these problems.
Efficiency issues with SELECT in PostgreSQL