Query optimization and paging algorithm for massive databases 1/2 1th/2 Page

Query optimization and paging algorithm for massive databases 1/2 1th/2 Page _ Database Other

Last Update:2017-01-18 Source: Internet

Author: User

Tags data structures

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

With the construction of "Golden Shield project" and the rapid development of public security informationization, the computer application system of public security is widely used in various duties and departments. At the same time, the core of application system system, the storage of system data-database also with the actual application and rapid expansion, some large-scale systems, such as the population system more than 10 million data, can be described as massive. Then, how to quickly extract data from these super large capacity database (query), analysis, statistics and data paging has become a problem for local system administrators and database administrators.
In the following article, I will take the "office automation" system as an example to explore how to achieve rapid data extraction and data paging in an MS SQL Server database with 10 million data. The following code illustrates some of the data structures in the "red-headed" Table of the database in our instance:
CREATE TABLE [dbo]. [Tgongwen] (--tgongwen is the red-headed table name
[Gid] [INT] IDENTITY (1, 1) not NULL,
--The ID number of this table, also the primary key
[Title] [varchar] (COLLATE) Chinese_prc_ci_as NULL,
--Red-headed's title
[Fariqi] [DateTime] Null
--Release date
[Neibuyonghu] [varchar] (COLLATE) Chinese_prc_ci_as NULL,
--Publishing users
[Reader] [varchar] (900) COLLATE chinese_prc_ci_as NULL,
--users who need to be browsed. Separated by separator "," in each user's middle
) on [PRIMARY] textimage_on [PRIMARY]
Go
Below, we add 10 million data to the database:
DECLARE @i int
Set @i=1
While @i<=250000
Begin
Insert into Tgongwen (fariqi,neibuyonghu,reader,title) VALUES (' 2004-2-5 ', ' Communications section ', ' Communications Division, Office, director Wang, director Liu, director Zhang, admin, Criminal investigation Detachment, Secret Service detachment, to patrol detachment, through the investigation detachment, the family branch, Security Detachment, Foreign Affairs Division ', ' This is the first 250,000 records ')
Set @i=@i+1
End
Go
DECLARE @i int
Set @i=1
While @i<=250000
Begin
Insert into Tgongwen (fariqi,neibuyonghu,reader,title) VALUES (' 2004-9-16 ', ' Office ', ' Office, Communications section, director Wang, director Liu, director Zhang, admin, Criminal investigation Detachment, Secret Service detachment, to patrol detachment, through the investigation detachment, the Family Branch, Foreign Affairs Division ', ' This is the middle of 250,000 records '
Set @i=@i+1
End
Go
DECLARE @h int
Set @h=1
While @h<=100
Begin
DECLARE @i int
Set @i=2002
While @i<=2003
Begin
DECLARE @j int
Set @j=0
While @j<50
Begin
DECLARE @k int
Set @k=0
While @k<50
Begin
Insert into Tgongwen (Fariqi,neibuyonghu,reader,title) VALUES (CAST (@i as varchar (4)) + ' -8-15 3: ' +cast (@j as varchar (2)) + ': ' +cast (@j as varchar (2)), ' Communications section ', ' Office, Communications branch, director Wang, Secretary Liu, director Zhang, admin, Criminal Investigation Detachment, special Service Detachment, Patrol detachment, through the investigation detachment, the Family Administration Branch, Foreign Affairs Division ', ' this is the last 500,000 records '
Set @k=@k+1
End
Set @j=@j+1
End
Set @i=@i+1
End
Set @h=@h+1
End
Go
DECLARE @i int
Set @i=1
While @i<=9000000
Begin
Insert into Tgongwen (fariqi,neibuyonghu,reader,title) VALUES (' 2004-5-5 ', ' Communications section ', ' Communications Division, Office, director Wang, director Liu, director Zhang, admin, Criminal investigation Detachment, Secret Service detachment, to patrol detachment, through the investigation detachment, the family branch, Security Detachment, Foreign Affairs Section ', ' This is the last added 9 million records '
Set @i=@i+1000000
End
Go
Through the above statements, we have created 250,000 records issued by the Communications section on February 5, 2004, 250,000 records issued by the Office on September 6, 2004, 100 2,500 identical dates in 2002 and 2003, and different minutes of records issued by the Communications section (total 500,000), There are also 9 million records issued by the Communications section on May 5, 2004, totaling 10 million.
First, the establishment of an "appropriate" index due to the situation
Establishing "appropriate" index is the first prerequisite to realize query optimization.
Index is another important, user-defined data structure stored on physical media other than tables. When you search for data based on the value of an index code, the index provides fast access to the data. In fact, without an index, the database can also successfully retrieve the results from the SELECT statement, but as the table becomes larger, the use of an "appropriate" index is becoming more apparent. Note that in this sentence, we use the word "appropriate" because, if the index is used without careful consideration of its implementation process, the index can both improve and damage the performance of the database.
(i) Understanding the index structure in simple and simple sense
In fact, you can interpret the index as a special kind of directory. Microsoft's SQL Server provides two indexes: a clustered index (clustered index, also known as a clustered, clustered index) and a nonclustered index (nonclustered index, also known as a nonclustered and nonclustered index). For example, let's take a look at the difference between a clustered index and a nonclustered index:
In fact, the text of our Chinese dictionary is itself a clustered index. For example, if we look at the word "Ann", we will naturally open the first few pages of the dictionary, because the pinyin of "an" is "an", and the dictionary of Chinese characters according to Pinyin is the beginning of the English letter "a" and ending with "Z", then the word "Ann" is naturally ranked in the front of the dictionary. If you end up with all the "a" parts you still can't find the word, so you don't have the word in your dictionary; Similarly, if you look at the word "Zhang", you will also turn your dictionary to the last part, because "Zhang" Pinyin is "Zhang". That is, the body part of the dictionary itself is a directory, and you don't need to look up other catalogs to find what you need to find.
We refer to the content of the body as a "clustered index", which is arranged according to certain rules.
If you know a word, you can quickly find the word from automatic. But you may also encounter words you don't know, do not know its pronunciation, at this time, you can not follow the way you have just found the word you want to search, and need to go according to the "radical" to find the word you are looking for, and then according to the number of the word after the page directly to find the word you are looking for. But the sort of word you find in conjunction with the "Radical catalog" and "CJK ideographs table" is not really the sort of method of body text, for example, you look up the "Zhang" word, we can see in the CJK ideographs table after the radical, "Zhang" page number is 672 pages, CJK ideographs Table "Zhang" above is "Chi" word, but the page number is 63 pages, "Zhang" below is "crossbow "Word, the page is 390 pages. Obviously, these words are not really in the "Zhang" word of the upper and lower side, now you see the continuous "Chi, Zhang, crossbow" three words is actually their sorting in the nonclustered index, is the dictionary body of words in the nonclustered index mapping. We can find the word you need in this way, but it takes two processes to find the result in the TOC and then turn to the page number you want.
We refer to this catalogue as purely a directory, and the text is simply a sort of text that is called a nonclustered index.
From the above example, we can understand what is "clustered index" and "nonclustered index".
Further, we can easily understand that each table can have only one clustered index, because the catalog can only be sorted in one way.
(ii) When to use clustered or nonclustered indexes
The following table summarizes when to use clustered or nonclustered indexes (very important).
Action Description
Using Clustered Indexes
Using Nonclustered indexes
Columns are often sorted in groups
Should
Should
Returns data in a range
Should
should not be
One or very few different values
should not be
should not be
A small number of different values
Should
should not be
A large number of different values
should not be
Should
frequently updated columns
should not be
Should

Current 1/2 page 12 Next read the full text

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More