Note: you are familiar with table creation. Note the following points: 1. It is best to strip a separate table from the big data field, so as to affect performance 2. Use varchar instead of char, because varchar dynamically allocates length, char is specified as 20, and you can instantly store the character "1 ", it is still the length of 20. 3. Create a primary key for the table. Many tables do not have a primary key, which affects the query and index definition. 4. Avoid the table field running as null, if you do not know what value to add, we recommend that you set the default value, especially the int type. For example, if the default value is 0, the efficiency of the index query is obvious. 5. Create an index. Clustered index indicates the physical storage order of the data. It is best to create a unique and non-empty field. The more other indexes, the better, indexes have significant advantages in queries. Clustering indexes are created on fields that frequently update data. The results are very serious and inserting updates is quite busy. 6. The actual and specific query modes should be considered for the establishment of a combination index and a single index. notes for SQL statements: anyone familiar with SQL statements can write SQL statements. However, there is no difference in the efficiency and data volume below 100,000. However, there are some basic things, we still need to start from scratch. 1. When there are multiple query conditions, the SQL statement is executed in the order from right to left. That is to say, the conditions written at the end will be first executed, this means that the addition of the most filtered data volume should be written at the end, so as to achieve the optimal performance. 2. Join statements. If there are 30 records in Table A of 1000 million and Table B, it should be a join B, and the SQL Execution will be associated with a Based on B. The performance is significant. 3. Table variables and temporary tables. This problem may be unfamiliar to some people, the so-called Table variable is the declaration similar to declare @ dd table, while the temporary table is more common than # TT. Such a format is the difference between the two in SQL. Table variables have limited functions, automatic release after use. The temporary table plan has most of the functions of the data table. It needs to be deleted after use. The database tempdb also has corresponding operation records. It should be considered comprehensively. 4. Pay attention to the use of Aggregate functions, such as is null and <>. When we use them like this, we may get used to it. In terms of processing large data volumes, the original indexes will no longer be used, it turns into full table scan, so ga, etc. The performance is very slow, for example, where Len (studentid) = 0. The index originally on studentid is changed to full table scan. 5. Use exist to replace not in. We all know this, but I still hope you will pay attention to the following: 6. Select AA and BB, instead of select *. This is a common problem for most programmers and it doesn't matter, in fact, during SQL Execution, select * still needs to traverse specific field names for reading. Even if we want to use all columns, we recommend that you list all fields, instead of using * 7, Top N, and order by, what do you know? I have always thought that order by is sorted Based on the where condition query. In fact, it is wrong. When Top N and order by are used together, order by is actually a full table scan, you can use the query plan for verification. 8. Try to use a database with a large amount of data to test the Encoding Process and unit test process. It is best to use the actual data test. 9. Do not update the data of tables with triggers frequently. Other considerations:
1. Use indexes reasonably
An index is an important data structure in a database. Its fundamental goal is to improve query efficiency. Currently, most database products adopt the isam index structure first proposed by IBM. The index should be used properly. The usage principles are as follows:
● The optimizer automatically generates an index for fields that are frequently connected but not specified as foreign keys.
● Index the columns that are frequently sorted or grouped (that is, group by or order by operations.
● Create a search for columns with different values that are frequently used in conditional expressions. Do not create an index for columns with fewer values. For example, in the "gender" column of the employee table, there are only two different values: "male" and "female", so there is no need to create an index. If an index is created, the query efficiency is not improved, but the update speed is greatly reduced.
● If there are multiple columns to be sorted, you can create a compound index on these columns ).
● Use system tools. For example, the Informix database has a tbcheck tool that can be checked on suspicious indexes. On some database servers, the index may be invalid or the reading efficiency may be reduced due to frequent operations. If an index-based Query slows down, you can use the tbcheck tool to check the index integrity, fix the issue if necessary. In addition, when a database table updates a large amount of data, deleting and re-indexing can increase the query speed.
2. Avoid or simplify sorting
Duplicate sorting of large tables should be simplified or avoided. When indexes can be used to automatically generate output in the appropriate order, the optimizer avoids the sorting step. The following are some influencing factors:
● The index does not contain one or more columns to be sorted;
● The Order of columns in the group by or order by clause is different from that of the index;
● Sort columns from different tables.
In order to avoid unnecessary sorting, We need to correctly add indexes and reasonably merge database tables (although it may affect table standardization sometimes, it is worthwhile to Improve the efficiency ). If sorting is unavoidable, you should try to simplify it, such as narrowing the column range of sorting.
3. Eliminates sequential access to data in large table rows
In nested queries, sequential access to a table may have a fatal impact on query efficiency. For example, the sequential access policy is used to create a nested layer-3 query. IF 1000 rows are queried at each layer, 1 billion rows of data are queried. The primary way to avoid this is to index the connected columns. For example, two tables: Student table (student ID, name, age ......) And Course Selection form (student ID, course number, score ). If you want to connect two tables, you need to create an index on the join field "student ID.
Union can also be used to avoid sequential access. Although all check columns are indexed, some forms of where clauses force the optimizer to use sequential access. The following query forces sequential operations on the orders table:
Select * from orders where (customer_num = 104 and order_num> 1001) or order_num = 1008 |
Although indexes are created on customer_num and order_num, the optimizer still uses sequential access paths to scan the entire table in the preceding statement. Because this statement is used to retrieve the set of separated rows, it should be changed to the following statement:
Select * from orders where customer_num = 104 and order_num> 1001 Union select * from orders where order_num = 1008 |
In this way, you can use the index path to process queries.
4. Avoid related subqueries
The label of a column appears in both the primary query and the where clause query. It is very likely that after the column value in the primary query changes, the subquery must perform a new query. The more nested query layers, the lower the efficiency. Therefore, avoid subqueries as much as possible. If the subquery is unavoidable, filter as many rows as possible in the subquery.
5. Avoid difficult Regular Expressions
Matches and like keywords support wildcard matching, technically called regular expressions. However, this matching is especially time-consuming. Example: Select * from customer where zipcode like "98 ___"
Even if an index is created on the zipcode field, sequential scanning is used in this case. If you change the statement to select * from customer where zipcode> "98000", the query will be executed using the index, which will obviously increase the speed.
In addition, avoid non-starting substrings. For example, if select * from customer where zipcode [2, 3]> "80" is used in the WHERE clause, non-starting substrings are used. Therefore, this statement does not use indexes.
6. Use temporary tables to accelerate queries
Sort a subset of a table and create a temporary table, which sometimes accelerates query. It helps avoid multiple sorting operations and simplifies the optimizer's work in other aspects. For example:
Select Cust. Name, rcvbles. Balance ,...... Other columns from Cust, rcvbles where Cust. customer_id = rcvlbes. customer_id and rcvblls. Balance> 0 and Cust. postcode> "98000" order by Cust. Name |
If this query is executed multiple times but more than once, you can find all the unpaid customers in a temporary file and sort them by customer name:
Select Cust. Name, rcvbles. Balance ,...... Other columns from Cust, rcvbles where Cust. customer_id = rcvlbes. customer_id and rcvblls. Balance> 0 order by Cust. Name into temp cust_with_balance |
Then, query the temporary table in the following way:
Select * From cust_with_balance where postcode> "98000" |
The temporary table has fewer rows than the primary table, and the physical order is the required order, which reduces disk I/O, so the query workload can be greatly reduced.
Note: after a temporary table is created, the modification to the primary table is not reflected. Do not lose data when the data in the master table is frequently modified.
7. Use sorting to replace non-sequential access
Non-sequential disk access is the slowest operation, as shown in the back-and-forth movement of the disk inventory arm. SQL statements hide this situation, making it easy for us to write a query that requires access to a large number of non-sequential pages when writing an application.
In some cases, the database sorting capability can be used to replace non-sequential access to improve queries.
Summary:
It can be seen that the WHERE clause uses the index and cannot be optimized, that is, table scanning or additional overhead occurs.
1. Any operation on the column will cause the table to scan, including database functions and calculation expressions. During the query, try to move the operation to the right of the equal sign.
2. the in and or clauses usually use worksheets to invalidate the index. If there are no large number of duplicate values, consider splitting the clause. The split clause should contain the index.
3. Be good at using stored procedures to make SQL more flexible and efficient