How to improve performance select Top N * from [tablename] order by newid ()

Source: Internet
Author: User

How to improve performance select Top N * from [tablename] order by newid ()

To obtain a random record from the database
I checked this method online.
However, tables with poor performance are more than 20 seconds old.
How to improve performance
Select top 1 * from [tablename] order by newid ()

Select * from [tablename]
Where id = (select count (*) + 1 from tablename) * rand ()

If the ID is continuous, this is the fastest method.

Select top 1 * from [tablename]
Where ID> = (select max (*) from tablename) * rand ()

Id discontinuous ~~

Try to accept it like this:
SQL code
Select top 10 * from TB tablesample (10000 rows)

E...

Efficiency seems to be good, only 08 is supported.

Another method is to add a computed column to the table with the value newid (),

When the SELECT statement is used, the like Rand () column is calculated to arrange several letters, which can be random to a certain extent without scanning the entire table.

The test results of the above two methods are as follows. Here, the rest of the temp table is the primary key, and the temp table has 20000 rows of records:

Method 1 -- average 22 seconds:
SQL code
Declare @ I int;
Declare @ r int;
Set @ I = 1;
While @ I <2000
Begin
Set @ r = (select top 1 rest from temp order by newid ());
Set @ I = @ I + 1;
End

Method 2 -- average 24 seconds:
SQL code
Declare @ I int;
Declare @ r int;
Set @ I = 1;
While @ I <2000
Begin
Set @ r = (select rest from temp where rest in (select top 1 rest from temp order by newid ()));
Set @ I = @ I + 1;
End

Method 3, that is, the method described at the beginning, takes an average of 10 seconds:
SQL code
Declare @ I int;
Declare @ r int;
Set @ I = 1;
While @ I <2000
Begin
Set @ r = (select top 1 rest from temp where rest not in
(Select top (
Select cast (select count (*) from temp) * rand () as INT)
) Temp. Rest from temp ));
Set @ I = @ I + 1;
End

This method should be the fastest. Only for tables with primary keys (or indexes.

Use the following method:
Select * from [tablename] Where pk in (select top 1 PK from [tablename] order by newid ())

PK is the primary key-hopefully with a clustered index. If it is the case, th ......

Although PK is a clustered index, newid () is used to generate a column value for each row, so it is required to perform clustered index scan.
So
Select * from [tablename] Where pk in (select top 1 PK from [tablename] order by newid ())
It should be comparable
Select top 10 * from [tablename] order by newid ()
More Io consumption.

However, if you create another non-clustered index for the primary key, it will be different.
Create index ix_pk on [tablename] (PK)

In this case, the number of pages of a non-clustered Index containing the pkvalue is much smaller than that of the whole clustered index page.

Therefore, to improve performance, you need to create a non-clustered index for the primary key.

The method on the 6th floor is correct, and this is also the most common usage for such problems.

Method 1
Select * from [tablename] Where pk in (select top 1 PK from [tablename] order by newid ())
Method 2
Select top 1 * from [tablename] order by newid ()

The two methods seem to be similar in efficiency, but they are not. The subquery in method 1 gives the query optimizer more options.
Here, PK is clustering. What if there are other non-clustered indexes on this table?
It is certain that method 1 is more efficient than method 2. subqueries can be processed on non-clustered indexes.

If this table only has a primary key and no other non-clustered indexes, the query in method 1 is equivalent:
Method 3
Select * from [tablename] Where pk in (select top 1 PK from [tablename] With (index (1) Order by newid ())

Method 3 is more efficient than method 2. First, it is clear that method 3 and method 2 involve the same IO.
The difference lies in the difference between the two queries on CPU consumption. method 2 is compared to method 3 in the operation of sorting top1,
The CPU consumption is more than method 3, because operations except clustered index scanning are performed. method 2 is based on the whole row of records,
Method 3 is based on the primary key field.

The landlord's requirement: to obtain a random record from the database
Condition: tens of millions of tables
Database: SQL Server 2005 +

Solution:
Select top 1 * From tbname tablesample (xxx rows)

Xxx should be an integer,AlgorithmAs follows:
Calculate the approximate number of rows on a data page based on the row size of your table. XXX is the number of rows per page at least. We recommend that you set this parameter to 2-3 times the number of rows per page.

If you do not want to calculate this number, you can also write it as follows:
Select top 1 * From tbname tablesample (0.1 percent)

Because the tablesample option only reads data on the entire page, even a large table can read several pages quickly.
With the tablesample option, the database randomly retrieves one or several data pages, returns results, does not scan the index, does not scan the entire table, or sorts the results.
At this time, do not be misled by the execution plan. The execution plan is written in Table scan.
Set statistics Io on at this time. You can see that there are only a few logical and physical reads.

analyze the preceding Methods:
1. select * from [tablename] Where pk in (select top 1 PK from [tablename] order by newid ()
2. select top 1 * from [tablename] order by newid ()
both methods need to be sorted, so io logic reads more, in addition, the CPU usage is much higher.
method 1 is better than method 2. Because method 1 performs subqueries, less content is sorted.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.