How to design a database (5)

Source: Internet
Author: User

I 've been so busy recently that I haven't written a blog for three months.

This requirement was met in the use of Mongo, but I think it is easier to put this requirement into traditional RDBMS. The requirement is as follows: assume that your database uses Sqlserver and there is a table with data records. You need to select a random data entry in the table.

Assume that the data structure clustered index discussed in this article is on the Pk and the UserName is on the non-clustered index ):

Your first response is probably: Oh, Mom, it's a coincidence that the primary key uses Int auto-incrementing. I only need to generate a random number and then find the primary key corresponding to the random number.

The implementation steps are as follows:

① Return the maximum IdMax value of the ID in the database

② Generate a random number int Random = new random (). Next (1, IdMax) between 1 and IdMax );

③ Use UserID = random as the condition Query

④ If no data is found, a random number is generated again and the data of a UserID may be deleted)

This method is simple and violent, but there is a fatal problem: When I create a table, to illustrate this method, the primary key uses Int, however, in most production environments I know, Guid is used. This fatal problem will directly cause the method above to be unavailable.

As for why most production environments I know use guids instead of Int, I will make a comparison in the next article.

Since Int cannot use Guid as the primary key, we should use Row_Number. Sqlserver must support Row_Number. It seems that similar concepts in Oracle and MySql are uncertain. I asked my colleagues for a positive answer and did not go into details ).

The implementation steps are as follows:

① Return the total number of data records in the database count

② Generate the random number int Random = new random (). Next (1, count) between 1 and count );

③ Query the data of Row_Number = random.

However, Row_Number has an extremely bad condition, that is, the slower the data that comes after the query, the more resources the query consumes. However, databases that store data in sequence have this problem. For example, the following two statements:

 
 
  1. select * from 
  2. (SELECT  UserID,UserName,Password,Sex,City,ROW_NUMBER()OVER(ORDER BY CURRENT_TIMESTAMP) as Number  
  3.   FROM [User_db].[dbo].[Users] ) as query  
  4.   where query.Number = 20  
  5.     
  6.     
  7. select * from 
  8. (SELECT  UserID,UserName,Password,Sex,City,ROW_NUMBER()OVER(ORDER BY CURRENT_TIMESTAMP) as Number  
  9.   FROM [User_db].[dbo].[Users] ) as query  
  10.   where query.Number = 5000000 

The first row to query Row_Number = 20, logical reads 5. elapsed time = 58 ms.

The second row queries data with Row_Number = 5000000. logical reads 90208. elapsed time = 900 ms.

It can be seen that the latter has too many logical reads, and the running speed is much slower. If this function is frequently used, for example, it is recommended to users randomly, this will become a performance bottleneck.

Some netizens say this sentence:

 
 
  1. SELECT TOP 1 * FROM Users ORDER BY NEWID()  

The running results are correct, but the efficiency is greatly reduced. For example, I found 1,336,793rd pieces of data, logical reads 90208, elapsed time = 3026 MS

View the execution plan and find that Sort occupies 98% of the total usage:

Is there a better way than Row_Number?

The answer is to add a Random column to the table to change the data structure to the following:

When adding data, a random number is generated and inserted. In this example, a random number between 0 and 0.1 billion can be generated at the beginning. Note: you must add an index to the Random.

The implementation steps are as follows:

① Add a random value when inserting data

② Generate a Random number and query the select top (1) * from Users where Random> Random Number

③ The query results may have multiple but not many results). You can filter multiple data records randomly. You can easily implement the query using the Linq method. I will not repeat it here)

Now, I have finished talking about it. Please allow me to sell cute words at the end: Smart readers, start your brains. Do you have any better solutions? If yes, leave a message.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.