Database optimization and fault tracking and troubleshooting for large websites (Part 1)

Source: Internet
Author: User

[Topic]

On a large website, there is a table that stores news records, with a database volume of about 50 thousand (not big in fact). News on the website pages are dynamically produced from this table, and there are also 80 ~ News from 90 websites are also dynamically produced from this table. As a result, the access volume to this table is very large, especially when the website crashes. In response to this situation, we will optimize the website and describe the situations that are found during the optimization or may lead to an endless loop.

Statement]
This article has been published on the blog site, but a problem occurred when modifying the questions raised by netizens. Follow the previous and next sections for release. Dudu. If my behavior is inappropriate, delete this article. Thank you!


Website framework] IIS6.0 + ms SQL 2000 + ASP3.0 + win 2003


Ideas]

1. Use the windows Task Manager to view CPU usage of the process. If the data library (sqlservr.exe) occupies a high cup, it is generally necessary to optimize the database (not to mention the optimization tool here). If the IIS (w3pw.exe) occupies a high cup (high), it is a little outrageous, or even very high in an instant) if so, we should look at the code, which is highly suspected of endless loops.

 

2. Database optimization mainly involves indexing, query statements, stored procedures, and ASP code.

 

3. You can create an application pool for IIS to achieve optimization.

 

4. This article does not discuss server hardware upgrades.

 

Practice]

 

1. Open the task manager of the server
Hypothesis:Sqlservr.exe occupies a very high cup, and the server cup reaches 100%.

 

Database optimization strategy:

A. Create an index from the database itself. To create an index, you can write a long article.

(1) index structure:

You can think of indexes as a special directory. Microsoft's SQL SERVER provides two types of indexes: clustered index (also called clustered index and clustered index) and non-clustered index (nonclustered index, also known as non-clustering index and non-cluster index ).

I saw an article on the Internet about clustering indexes and non-clustering indexes. It is very popular: the text of our Chinese dictionary is a clustered index. For example, if we want to check the word "an", we will naturally open the first few pages of the dictionary, because the Pinyin of "an" is "", the Dictionary of Chinese characters sorted by pinyin starts with the English letter "a" and ends with "z". Then the word "an" is naturally placed at the front of the dictionary. If you still cannot find the word in all the parts starting with "a", it means that you do not have this word in your dictionary. Similarly, if you query the word "Zhang, then you will turn your dictionary to the last part, because the Pinyin of "zhang" is "zhang ". That is to say, the body of the dictionary itself is a directory, and you do not need to query other directories to find the content you need.

We refer to this text content as a directory arranged according to certain rules as "clustered Index ".

If you know a word, you can quickly find it automatically. However, you may also encounter a word you do not know and do not know its pronunciation. At this time, you cannot find the word you want to query according to the method just now, you need to find the word you are looking for based on the "radicals", and then directly go to a page based on the page number after the word to find the word you are looking. However, the words you find in combination with the "radical directory" and "word checking table" are not really the sorting method of the text. For example, you can query the word "Zhang, we can see that the page number of the "Zhang" in the word checking table after the department head is 672, and the "Zhang" in the word checking table is "Chi", but the page number is 63, under "Zhang" is the word "", and the page is 390 pages. Obviously, these words are not really in the upper and lower sides of the word "Zhang, the continuous "Chi, Zhang, and "words you see are actually their sorting in the non-clustered index, which is the ing of words in the dictionary body in the non-clustered index. We can find the words you need in this way, but it requires two steps: first find the results in the directory, and then flip to the page number you need.

We refer to this directory as a directory, and the text as a non-clustered index ".

 

(2) General rules for clustering and non-clustering indexes:

Action Description

Use clustered Index

Use non-clustered Index

Columns are sorted by group.

Ying

Ying

Returns data within a certain range.

Ying

Should not

One or few different values

Should not

Should not

Different decimal values

Ying

Should not

Different values of large numbers

Should not

Ying

Frequently updated Columns

Should not

Ying

Foreign key column

Ying

Ying

Primary Key Column

Ying

Ying

Frequently modify index Columns

Should not

Ying

(3) According to the actual situation, do not think that the primary key should use clustering index (ms SQL set the primary key as the default clustering index ). Generally, we create an ID column in each table to distinguish each data entry. The ID column is automatically increased and the step size is generally 1. At this time, if we set this column as the primary key, SQL SERVER will set this column as a clustered index by default. In this way, your data can be physically sorted by ID in the database, but in practice, because the ID is automatically generated, we do not know the ID number of each record, therefore, it is difficult for us to use the ID number for queries in practice. This makes the primary key of the ID number a waste of resources as a clustered index.
 
(4) SQL statement optimization example:

Definition of SARG: An operation used to restrict search, because it usually refers to a specific match, a matching within a worthy range or an AND connection between two or more conditions. The format is as follows:
Column name operator <constant or variable> or <constant or variable> operator column name

1) if you create an index on the name segment of the table, perform fuzzy search using like.
Please use name like 'sheet %'
Do not use: name like '% sheets %'
The index cannot be used if the character string '%' is used.

2) The cost of the or statement is caused by full table scan.
Name = 'zhang san' and age> 20
Name = 'zhang san' or age> 20
The second statement will scan the entire table. Please pay attention to the usage and avoid it as much as possible.

3) You do not need to write statements such as select * from table.
If you only need the name and age, you need to develop the habit of using select name, age from table.

4) use nested queries with caution
For statements such as select name from table where age in (select age from table), the entire table is scanned and the index is meaningless.
 
5) use top
 
Typical paging statement:
 
SELECT TOP PAGESIZE NEWSTITLE
FORM NEWSINFO WHERE NEWSID NOT IN
(Select top (PAGE-1) * PAGESIZE NEWSID FROM NEWSINFO
WHERE Auditing = 1 and NEWSBREED = 'enterprise news 'order by newsid desc)
AND Auditing = 1 and NEWSBREED = 'enterprise news 'order by NEWSID DESC

PAGE indicates the current PAGE number, and PAGESIZE indicates the PAGE size. not in is used here, but it is better than reading all records at a time.

There is also a better solution for my instance: Because the NEWSID field is an auto-incrementing field, the not in is transformed as follows, without affecting the result. But the speed has improved a lot.
SELECT TOP PAGESIZE NEWSTITLE
Form newsinfo where newsid>
(Select max (NEWSID) FROM (select top (PAGE-1) * pagesize newsid from newsinfo where Auditing = 1 and NEWSBREED = 'enterprise News' order by NEWSID) as tb) AND Auditing = 1 and NEWSBREED = 'enterprise news 'order by NEWSID

6) The leading columns of the composite index are the most frequently used in the query conditions.
 
For example, a composite index is created on the PUTDT and AUTHORNAME columns, where PUTDT is the leading column.
For the following three statements:
Select putdt, age from USER_NEWS where putdt> '2017-1-16'
Select putdt, age from USER_NEWS where putdt> '2017-1-16 'and AUTHORNAME = 'David'
Select putdt, age from USER_NEWS where authorname = 'David'

Note:
The first statement is the fastest, followed by the second statement and the third statement is the slowest.
The index in article 3 is invalid. Therefore, pay attention to the details when creating a composite index.
The order of condition statements in article 2 does not affect performance. The "query optimizer" is used for optimization.

7) If COUTN (*) is only used to obtain the number of rows, you can use rowset count.

8) Check SQL statement Performance

A. Open "query analyzer", open the "query" menu, click "show query plan", and execute the following statement.
Select title, price from titles where title_id in
(Select title_id from sales where qty> 30)


Select title, price from titles where exists
(Select * from sales where sales. title_id = titles. title_id and qty> 30)

View query plan:

The two statements have the same performance and verify that IN and EXISTS are equivalent.

 

B. Add: declare @ d datetime set @ d = getdate () before each select statement and add: select [statement execution time (MS)] After select statement. = datediff (MS, @ d, getdate ())

The number of milliseconds required for SQL statement execution.

BAfter using the above optimization, it is found that the CPU usage of the database process has declined, but it is still high.

Use the ms SQL event probe to track ms SQL requests.

Open"Event Probe", New"Tracking"

1) if there are many RPC Events in the "event probe", and execute sp_cursoropen sp_cursorfetch

Sp_cursorclose indicates that the cursor service is not suitable when ASP dataset objects are used.

Use "client cursor". Code: RS. CursorLocation = 3 where RS is the DataSet object and 3 indicates the client cursor. Do not useAdUseClient,Sometimes there are problems.

 

2) Notes for operations on DataSet objects

RS. Open is generally recommended as follows:

Rs. open SQL, conn, 0, 1 sequential traversal, no need to locate the jump, no need to add Delete update operation, the fastest speed

Rs. open SQL, conn, 1, 3 traversal, can be updated, but cannot be positioned to jump

Rs. open SQL, conn, 2, 3 can perform all operations, you can jump

 

Description: The third parameter indicates the cursor type, and the fourth parameter indicates the lock type.

See http://www.cnblogs.com/David-weihw/archive/2007/01/10/616936.html

 

 

After the above optimization, the CPU usage of the ms SQL process is usually high. If not, it will be serious. re-design the database storage structure.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.