Software System Performance optimization policy-SQL Optimization

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Thank you very much for your support for "C # delegation and event explanation of the vernacular series". This time I will give you an explanation of "performance optimization policies for software systems, this tutorial focuses on SQL optimization, IIS optimization, code optimization [BS architecture], database access optimization, and Cache Optimization.

In this article, we will begin to explain about SQL optimization. Why should we first talk about SQL optimization? Because I think this is a basic skill for programmers, and we must master it.
SQL statements can complete relevant functions, but have you considered whether these statements will significantly slow down the efficiency when encountering massive data or brute-force access? Maybe many programmers, like me, will say, "what's wrong! The server is too broken, so slow !" Or "Why is the network so bad !" But few complain about your own code. In fact, these details are also the purpose of my explanation this time. I hope you can see more "far" and think more "deeply" when writing SQL statements ".

1. Create an appropriate index
Here, I don't want to talk about "Indexing Technology" for a long time, but hope that you can pay attention to "Indexing" and better understand "Indexing ", instead of simply having a superficial understanding of "primary key is index.

First, let's get a general idea of what is "Index" and "What Is database index" in Baidu. Then we can see this explanation in Baidu encyclopedia.

Database indexes are like directories in front of a book, which can speed up database queries.
For example, select * From Table1 where id = 44. If no index exists, you must traverse the entire table until the row with ID equal to 44 is found. If an index exists, it must be an index created on the column with ID ), find 44 in the index (that is, find the ID column) to find the location of this row, that is, find this row. It can be seen that the index is used for locating.
Indexes are classified into clustered indexes and non-clustered indexes. Clustered indexes are stored in the order of physical locations, but not clustered indexes; clustering indexes can improve the speed of multi-row search, rather than clustering indexes can quickly retrieve individual rows.

After reading Baidu's explanation, we should have a preliminary understanding. As for clustered and non-clustered indexes, you may still be confused, in fact, when you ask many old programmers about the differences between clustered and non-clustered indexes, their answers are "go to Baidu to check their own". In fact, you don't have to go into the details, which can be understood

1,Clustered Index-- Pinyin search in the dictionary refers to the dictionary. When we look for "gold", we will go directly to the "J" Letter to find it, because everyone in this dictionary is sorted by A-Z, this principle and the clustered index in the database would like to, if you set an index as clustered index, this indicates that this field is arranged in a certain order, and there can only be one clustered index in the database table, where the field data volume is large but the same value is large, for example, the time fields of systems such as securities companies and banks, because these systems may deal with thousands of data in a day, if you set a clustered index on the transaction time, it means that when you query the transaction records of "", the database will immediately locate these records and will not go through the data table.
Fields frequently used by clustered indexes:
A. Use operators (such as between,>, >=, <, and <=) to return a series of values.
B. Return a large result set.
C. Use a join clause. Generally, a foreign key column is used for this clause.
D. Use the order by or group by clause.

2,Non-clustered Index-- Search by strokes or radicals in the dictionary to associate the dictionary. When we do not know the pronunciation of a word, we first look for the help, and then find the page number of the word, finally, find the word. If you set a non-clustered index in the database field, it means that all the records in these fields have their index addresses. When you query a record, you can directly locate the record.

Questions:
When we set a primary key for a field in sql2005, it sets this primary key as a clustered index by default, which is advantageous, that is, you can sort your data physically by ID in the database, but I think this is not very reasonable, it is a waste of clustered indexes. Obviously, the advantages of clustered indexes are obvious, and each table can have only one clustered index rule, which makes clustered indexes more precious. From the definition of clustered index we mentioned above, we can see that the biggest advantage of using clustered index is that it can quickly narrow the query scope according to the query requirements to avoid full table scanning. In practice, because the primary key ID is automatically generated and we do not know the ID of each record, it is difficult for us to use the ID number for query in practice. This makes the primary key of the ID number a waste of resources as a clustered index.

So I think we should be careful when defining indexes for massive data tables! You can also give your own opinions on this issue.

Finally, let's go back to the starting point. Why should we set indexes? This is to quickly display the query results for faster queries. Some people will say that in our system, no index is created quickly! That is, the amount of data you query does not reach a certain level. Once the amount of data reaches a million level, every unreasonable SQL statement you write will bring you unexpected troubles.

Note: The first time I realized the importance of indexing, I was working at Coca-Cola and I was allocated to STM (cold cabinet quick serving system, at that time, the Phase One of the system was used in the production environment of seven factories in China. As the data volume accumulates day by day, it was found that the speed was particularly slow in a KPI data report, KPI reports must be viewed by senior managers of the Cola Company to evaluate the efficiency of employees. Senior executives were very impressed with our system at the time. You must know that once a large company gets angry with a senior manager, it will be restricted in many future activities. At that time, I first read the stored procedure they wrote and checked the database data volume. The report I/O output was only about 3000 records, the database has only about records, which cannot make the data report query for a month more than six minutes. Then I checked the KPI storage process, it is found that when they are nested in the SELECT statement, the primary and Foreign keys of the joined tables are not indexed, resulting in the SQL nesting of their reports being a product level, that is, the nested statement is a full calendar process. When we first looked at this stored procedure, we did not pay attention to the relationship between indexes. We also thought about a lot of solutions, but these solutions made too many changes to the original version, finally, they all died. Finally, by studying the SQL query execution plan, we find that the resources consumed by nested queries are extremely large, such:

At last, we found that the index of the associated table was not created. After the index is created, it is reduced from 6 minutes to 6 seconds, which far exceeds our expectation, the number of full-calendar operations performed by the database before the index is created.

Now that we know the importance of indexes, we should try not to break the index quickly during daily SQL writing. Next I will list several SQL statements that cause index failure.
1. Use or in the index Field
Example: Select * From Table1 where ID in (2, 3) or select * From Table1 where id = 2 or tid = 3
These statements will invalidate the ID index and cause the full table calendar. Of course, when the data volume does not reach a massive amount, you can write it like this, as long as the function is implemented, once a massive volume is reached, a detail determines success or failure.
Solution 1:
Select * From Table1 where id = 2 Union select * From Table1 where id = 3
Of course, you can also use other methods to implement functions.

Note [the above conclusion is bullshit]

2. The activation of wildcard % in the string makes the index unusable
Example: Select * From Table1 where name like '% Zhang'
If the index is set in the Name field, this method will cause the index to fail.
Note: Select * From Table1 where name like 'sheet % 'will not cause index failure

3. Non-operators will cause index unavailability
Example: Not ,! =, <> ,! <,!> , Not exists, not in, not like, and so on will cause index failure.

Ii. Restrict physical I/O operations
In fact, when querying and extracting ultra-large data sets, the biggest factor affecting the database response time is not data search, but physical I/0 operations. At this time, we should pay attention to two issues in the preparation of SQL statements.
1. Avoid "select *" operations
When writing SQL statements, we often use "select * from table" for convenience ". In fact, this is a very bad habit. When the data volume reaches a million level, an extra field will take n seconds.

2. Use "TOP" for Data Extraction
This is when we query massive data. If the query result is 10 million rows, we need to use the real paging Control for paging. The principle we use is top.

Note: It is best to create a million-level data table to verify these conclusions. After all, a clearer concept will emerge after you have performed the operations.

When we make some statistics or reports, 80% of the efficiency issues can be solved through the establishment of reasonable indexes and SQL optimization, and the effect is also very obvious. This time we will talk about SQL optimization. In fact, there are still many other issues that need to be noted out. This requires us to carefully understand them in our daily programming.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Software System Performance optimization policy-SQL Optimization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Software System Performance optimization policy-SQL Optimization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support