Efficiency issues and solutions for random query of RAND () in mysql
Efficiency has always been a problem during our development, especially for a lot of large data operations. Today we have a random query of data, at the beginning, we may think of the simplest order by rand () operation, but the efficiency is not flattering.
Recently, we have studied the implementation of MYSQL random extraction. For example, to randomly extract a record FROM the tablename table, the general syntax is: SELECT * FROM tablename order by rand () LIMIT 1.
There are two ways to achieve the above effect.
1. Create a new table with a number ranging from-5 to 5. Use order by rand () to obtain the random number.
# Create a data table with a specified range
Copy codeThe Code is as follows:
# Auther: Xiaoqiang (fortune manager) # date: invalid table randnumberselect-1 as numberunionselect-2 unionselect-3 unionselect-4 unionselect-5 unionselect 0 unionselect 1 unionselect 2 unionselect 3 unionselect 4 unionselect 5 # obtain a random number # auther: Xiaoqiang) # date: 2008-03-31select numberfrom randnumber order by rand () limit 1
Advantage: a random number can specify a part of the data, and does not need to be consecutive.
Disadvantage: it is difficult to create a table when the random number range is wide.
2. Use the ROUND () and RAND () functions of MySQL to implement
# An SQL statement # auther: Xiaoqiang (fortune manager) # date: 2008-03-31 copy the Code as follows: SELECT ROUND (0.5-RAND () * 2*5) # Note #0.5-rand () to obtain a random number ranging from-0.5 to + 0.5 # (0.5-rand ()) * 2, we can get a random number from-1 to + 1 # (0.5-rand ()) * 2*5: a random number ranging from-5 to + 5 can be obtained. # ROUND (0.5-RAND () * 2*5) can be a random integer ranging from-5 to + 5.
However, I checked the MYSQL official manual. The prompt for RAND () indicates that the RAND () function cannot be used in the ORDER BY clause, this will cause the data column to be scanned multiple times. However, in MYSQL 3.23, order by rand () can still be used for random operations.
However, the test results show that the efficiency is very low. It takes more than 8 seconds to query 5 data records in a database with more than 0.15 million entries. According to the official manual, rand () is executed multiple times in the order by clause, which is naturally inefficient and inefficient.
Search for Google. Basically, data is randomly obtained by querying max (id) * rand () on the Internet.
Copy codeThe Code is as follows:
SELECT * FROM 'table' AS t1 JOIN (select round (RAND () * (select max (id) FROM 'table') AS id) AS t2 WHERE t1.id> = t2.id order by t1.id asc limit 5;
However, five consecutive records are generated. The solution is to query only one item at a time and query five times. Even so, it is worthwhile because it takes less than 0.15 million seconds to query 0.01 tables.
The following statement uses JOIN, Which is used on the mysql forum.
Copy codeThe Code is as follows:
SELECT * FROM 'table' WHERE id> = (select floor (MAX (id) * RAND () FROM 'table') order by id LIMIT 1;
I tested it. It took 0.5 seconds and the speed was good, but there was still a big gap with the above statements. I always feel that something is abnormal.
So I changed the statement.
Copy codeThe Code is as follows:
SELECT * FROM `table`WHERE id >= (SELECT floor(RAND() * (SELECT MAX(id) FROM `table`)))ORDER BY id LIMIT 1;
The query efficiency is improved, and the query time is only 0.01 seconds.
Finally, complete the statement and add the MIN (id) judgment. At the beginning of the test, because I did not add the MIN (id) Judgment, half of the time is always the first few rows in the table.
The complete query statement is:
Copy codeThe Code is as follows:
SELECT * FROM `table`WHERE id >= (SELECT floor( RAND() * ((SELECT MAX(id) FROM `table`)-(SELECT MIN(id) FROM `table`)) + (SELECT MIN(id) FROM `table`)))ORDER BY id LIMIT 1;SELECT *FROM `table` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM `table`)-(SELECT MIN(id) FROM `table`))+(SELECT MIN(id) FROM `table`)) AS id) AS t2WHERE t1.id >= t2.idORDER BY t1.id LIMIT 1;
Finally, these two statements are queried 10 times in php respectively,
The former takes 0.147433 seconds.
The latter takes 0.015130 seconds.
It seems that using the JOIN syntax is much more efficient than using functions directly in the WHERE clause.
After many tests, we found that using the join syntax is much faster than using the where syntax. If you have better submissions, you can come out and discuss them.