Database Optimization Practice: Index Chapter

Source: Internet
Author: User

You and your team have made unremitting efforts to finally make the website successful online, at first, fewer registered users, good performance, but as the number of registered users, the speed of access began to slow, some users began to send mail to protest, things are getting worse, in order to retain users, you start to investigate the reasons for the slow access.

After a tense investigation, you find the problem on the database, when the application tries to access/update the data, the database executes quite slowly, after the database is further investigated, you find that the database table grows very large, some tables even have tens of millions of rows of data, the Test team began to test on the production database, It takes 5 minutes to find the order submission process, but it only takes 2/3 seconds to submit an order in the test before the site is online.

This kind of story happens every day in all corners of the world, and almost every developer has encountered this kind of thing in their development careers, and I've been through this many times, so I'd like to share my experience in solving this problem.

If you are in this project, Escape is not the way, only brave to face the reality. First of all, I don't think you have a data access program in your application, and I'll explain in this series how to write the best data Access program and how to optimize existing data access programs.

Range

Before we begin, it is necessary to clarify the writing boundaries of this series, and I would like to talk about the optimization of data access performance in Transactional (OLTP) SQL Server databases, but the techniques described in this article can also be used in other database platforms.

At the same time, the techniques I'm introducing are primarily for program developers, although DBAs are also a major force for optimizing databases, but the optimization methods used by DBAs are not within my scope of discussion.

When a database based application runs slowly, 90% of the time is probably due to a problem with the data Access program, either without optimizations or without the best way to write code, so you need to review and optimize your data access/processing program.

I'll talk about 10 steps to optimize the data access program, starting with the most basic index.

First step: Apply the correct index

I started with the index because using the right index would improve the performance of the production system, another reason is that the creation or modification of the index is done on the database, does not involve modifying the program, and can immediately see results.

Let's review the basics of the index, I believe you already know what is the index, but I see a lot of people are not very clear, I will give you a story first.

A long time ago, there were thousands of books in the big Library of an ancient city, but the books on the shelves were not placed in any order, so whenever someone asked a book, the librarian was looking for it, and it took a lot of time each time.

[This is as if the datasheet does not have a primary key, and the database engine must scan the data in the table with a very low efficiency.) ]

What's worse is that the library is getting more and more books, the librarian's work became extraordinarily painful, and one day a clever lad came to see the librarian's painful work and came up with a way he suggested that each book be numbered and then put on the shelf by number, if someone had assigned the book number, Then the librarian will soon be able to find its location.

[Book numbering is like creating a primary key for a table. When a primary key is created, a clustered index tree is created, and all rows in the table are physically sorted on the file system based on the primary key value, and when any row in the query table is used, the database first finds the corresponding data page using the clustered index tree (like finding the bookshelf first), The target row is then found on the data page based on the primary key value (like finding the book on the Bookshelf). ]

So the librarian began to give the book number, and then put the book on the shelf according to the number, so he spent a whole day, but finally after testing, he found that the efficiency of the book search greatly improved.

[Only one clustered index can be created on a table, just as a book can be placed in a single rule.) ]

But the problem is not completely solved, because many people can't remember the number of books, only remember the name of the book, librarian Rogue and only scan all the number of books to search, but this time he only spent 20 minutes, previously did not give the book number to spend 2-3 hours, but compared with the book number to find books, the time is still too long, So he turned to the clever lad for help.

[It's as if you've added a primary key ProductID to the product table, but there are no other indexes created, and when you retrieve using product Name, the database engine looks for a full table scan, one at a time.) ]

The smart guy told the librarian that he had already created the book number, and now you just need to create an index or a table of contents to store the name of the book together with the corresponding number, but this one is sorted by book name, and if someone is looking for a "Database Management System" book, All you have to do is jump to a directory that starts with "D" and then you can find the book by number.

So the librarian excitedly spent hours creating a "book name" catalog, and after testing, the time to find a book has been shortened to 1 minutes (of which 30 seconds are used to find numbers from the book name directory and 30 seconds to find books based on numbers).

Librarians are starting to think, and readers may also find books based on other attributes of the book, such as the author, he used the same method for the author also created a directory, now can be based on the book number, title and author in 1 minutes to find any books, the librarian's work becomes easier, the story ends here.

I believe you have fully understood the true meaning of the index. Let's say we have a Products table, creating a clustered index (automatically created from the table's primary key), we also need to create a nonclustered index on the ProductName column, and when we create a nonclustered index, the database engine automatically creates an index tree for the nonclustered index (as in the story " Book name "Directory", the product name is stored in the index page. Each index page includes a range of product names and their corresponding primary key values, and when retrieved using the product name, the database engine first looks up a nonclustered index tree based on the product name, finds the primary key key value, and then uses the primary key value to find the aggregate cable Lead the tree to find the final product.

The following figure shows the structure of an index tree

Figure 1 Index tree structure

It's called a B + tree (or a balanced tree), the middle node contains a range of values that directs where the SQL engine should look for specific index values, and if this is a clustered index tree, the leaf node is the physical data page, and if this is a nonclustered index tree, The leaf node contains index values and a clustered index key (the database engine uses it to find the corresponding row in the clustered index tree).

Typically, finding the target value in the index tree and then jumping to the actual row is not going to take much time, so the index generally improves the speed of data retrieval. The following steps will help you apply the index correctly.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.