Improved MySQL Order by Rand () inefficiency

Source: Internet
Author: User
Tags mysql manual php programming rand time 0 mysql slow query log

Author:flymorn Source: Easy to float
categories:php Programming Posttime:2011-1-14 15:35:07 Text:
Recently, due to the need to study the MySQL random extraction implementation method. For example, to randomly extract a record from the TableName table, the general notation is: SELECT * from Content ORDER by RAND () LIMIT 1.
【Float Note: 30,000 records the query takes 0.3745 seconds (the same as below); from MySQL slow query log, you see that "ORDER by RAND ()" is scanned 2 times by the full table! 】

Then I checked the official MySQL manual, and the Hints for rand () probably mean that the rand () function cannot be used in an ORDER BY clause because it causes the data column to be scanned multiple times. However, in MySQL version 3.23, the Order by RAND () can still be implemented randomly.

But the real test is that it's very inefficient. A library of more than 150,000, query 5 data, incredibly more than 8 seconds. Check out the official manual, which also says Rand () is executed several times in the ORDER BY clause, naturally efficient and very low.

Search Google, use join, query Max (ID) * RAND () to get the data randomly.
SELECT *
From ' content ' as T1 joins (select ROUND (RAND () * (select MAX (id) from ' content ')) as ID) as T2
WHERE t1.id >= t2.id
ORDER by T1.id ASC LIMIT 1;
【the query takes 0.0008 seconds, float easily think can recommend using this statement!! 】

However, this will produce 5 consecutive records. The solution can only be one query at a time, query 5 times. Even so it is worth it, because 150,000 of the tables, the query only need 0.01 seconds less than.

There is one way:
SELECT * from ' content ' as-a JOIN (select MAX (ID) as ID from ' content ') as-B on (a.id >= floor (b.id * RAND ())) LIMIT 5;
The above method guarantees a certain range of random, the query takes 0.4265 seconds, is not recommended.

The following statement is used by someone on the MySQL forum
SELECT *
From ' content '
WHERE ID >= (SELECT floor (MAX (ID) * RAND ()) from ' content ')
ORDER by ID LIMIT 1;
【query cost 1.2254 seconds, floating easily strongly not recommended! Since the measured, 30,000 rows of the table, this statement unexpectedly will scan 5 million lines!! 】

There is still a big gap with the above statement. There is something wrong with the total sleep. So I rewrote the sentence a bit.
SELECT * from ' content '
WHERE ID >= (SELECT Floor (RAND () * (select MAX (id) from ' content ')))
ORDER by ID LIMIT 1;
【query takes 0.0012 seconds】

This, the efficiency has improved, the query time only 0.01 seconds

Finally, the statement is refined, plus min (id) judgment. When I first tested it, it was because I didn't add the min (id) judgment, and half the time I always queried the previous rows in the table.
The complete query statement is:
SELECT * from ' content '
WHERE ID >= (SELECT Floor (RAND () * ((select MAX (id) from ' content ')-(select min (id) from ' content ') + (select min (id)) From ' content ')))
ORDER by ID LIMIT 1;
【query takes 0.0012 seconds】

SELECT *
From ' content ' as T1 JOIN (select ROUND (RAND () * ([select MAX (ID) from ' content ')-(select MIN (id) from ' content ') + (select MIN (ID) from ' content ')) as ID) as T2
WHERE t1.id >= t2.id
ORDER by T1.id LIMIT 1;
【query takes 0.0008 seconds】

Finally, the two statements were queried 10 times in PHP,
The former takes 0.147433 seconds
The latter takes time 0.015130 seconds
It seems that the syntax for join is much higher than the efficiency of using functions directly in the where. (VIA)

======================================
"Well, finally floating easy to summarize"
The first scenario, the original Order by Rand () method:
$sql = "SELECT * from Content ORDER by rand () LIMIT 12";
$result =mysql_query ($sql, $conn);
$n = 1;
$rnds = ";
while ($row =mysql_fetch_array ($result)) {
$rnds = $rnds. $n. ". <a href= ' show '. $row [' id ']. " -". Strtolower (Trim ($row [' title '])." > ". $row [' title ']." </a><br/>\n ";
$n + +;
}
30,000 data for 12 random records, it takes 0.125 seconds, with the increase of data volume, the efficiency is getting lower.

The second scenario, the improved JOIN method:
for ($n =1; $n <=12; $n + +) {
$sql = "SELECT * from ' content ' as T1
JOIN (select ROUND (RAND () * (select MAX (id) from ' content ')) as ID) as T2
WHERE t1.id >= t2.id ORDER by t1.id ASC LIMIT 1 ";
$result =mysql_query ($sql, $conn);
$yi =mysql_fetch_array ($result);
$rnds = $rnds. $n. ". <a href= ' show '. $yi [' id ']. " -". Strtolower (Trim ($yi [' title '])." > ". $yi [' title ']." </a><br/>\n ";
}
30,000 data for 12 random records, 0.004 seconds, a significant increase in efficiency, up to about 30 times times higher than the first option. Cons: Multiple select queries, high IO overhead.

The third scenario, the SQL statement first random good ID sequence, with in query (floating easily recommend this usage, IO cost small, fastest):
$sql = "SELECT MAX (ID), MIN (ID) from content";
$result =mysql_query ($sql, $conn);
$yi =mysql_fetch_array ($result);
$idmax = $yi [0];
$idmin = $yi [1];
$idlist = ";
for ($i =1; $i <=20; $i + +) {
if ($i ==1) {$idlist =mt_rand ($idmin, $idmax);}
else{$idlist = $idlist. ', '. Mt_rand ($idmin, $idmax);}
}
$idlist 2= "id,". $idlist;
$sql = "SELECT * from the content where ID in ($idlist) Order By field ($idlist 2) LIMIT 0,12";
$result =mysql_query ($sql, $conn);
$n = 1;
$rnds = ";
while ($row =mysql_fetch_array ($result)) {
$rnds = $rnds. $n. ". <a href= ' show '. $row [' id ']. " -". Strtolower (Trim ($row [' title '])." > ". $row [' title ']." </a><br/>\n ";
$n + +;
}
30,000 data for 12 random records, which takes 0.001 seconds, the efficiency is 4 times times higher than the second method, and 120 times times higher than the first one. Note that the Order by field ($idlist 2) is used here to not sort, otherwise in is automatically sorted. Disadvantage: It is possible to encounter the case where the ID is deleted, so you need to select several more IDs.

Test method:
$t = Microtime (true);
EXECUTE statement
Echo Microtime (True)-$t;

Improved MySQL Order by Rand () inefficiency

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.