How can we improve the query efficiency in a ten-million-level database query ?, Search Efficiency

Source: Internet
Author: User
Tags domain server

How can we improve the query efficiency in a ten-million-level database query ?, Search Efficiency

How to improve query efficiency in a ten-million database query?

1) Database Design:
A. to optimize the query, try to avoid full table scanning. First, consider creating an index on the columns involved in where and order.
B. try to avoid null value determination on the field in the where clause. Otherwise, the engine will discard the index and perform full table scanning, for example: select id from t where num is null you can set the default value 0 on num to ensure that the num column in the table has no null value, and then query: select id from t where num = 0

C. not all indexes are valid for queries. SQL queries are optimized based on the data in the table. When there is a large number of duplicate data in the index column, the query may not use the index, for example, if a table contains sex fields, male and female are almost half of each other, indexing sex does not play a role in query efficiency.

D. the more indexes, the better. Although the index can improve the efficiency of the select statement, it also reduces the efficiency of insert and update, because the insert or update statements may recreate the index, therefore, you need to carefully consider how to create an index, depending on the actual situation. It is recommended that the number of indexes in a table be no more than 6. If there are too many indexes, consider whether the indexes on some columns that are not frequently used are necessary.

E. update of index data columns should be avoided whenever possible, because the order of index data columns is the physical storage order of table records. Once the column value changes, the order of the entire table record will be adjusted, it will consume a considerable amount of resources. If the application system needs to frequently update the index data column, you need to consider whether to create the index as an index.

F. use numeric fields whenever possible. If fields containing only numerical information are not designed as numeric fields, this will reduce query and connection performance and increase storage overhead. This is because the engine compares each character in the string one by one during query and connection processing, and only one comparison is required for the number type.

G. try to use varchar/nvarchar instead of char/nchar, because the first step is to reduce the storage space of the variable-length field, which can save storage space. Secondly, for queries, searching in a relatively small field is obviously more efficient.

H. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, note that the index is very limited (only the primary key index ).

I. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.

J. Temporary tables are not unavailable. Use them appropriately to make some routines more effective. For example, when you need to reference a large table or a data set in a common table repeatedly. However, it is best to use the export table for one-time events.

K. when creating a temporary table, if a large amount of data is inserted at one time, you can use select into instead of create table to avoid creating a large number of logs to increase the speed. If the data volume is small, to ease system table resources, create table first and then insert.

L. if a temporary table is used, you must explicitly delete all temporary tables at the end of the stored procedure. First truncate the table and then drop the table, so that the system table can be locked for a long time.

2) SQL statements:

A. Try to avoid using it in the where clause! = Or <> operator. Otherwise, the engine will discard the index for full table scanning.

B. try to avoid using or in the where clause to connect to the condition. Otherwise, the engine will discard the index and perform a full table scan, for example: select id from t where num = 10 or num = 20 can be queried as follows: select id from t where num = 10 union all select id from t where num = 20

C. in and not in should also be used with caution, otherwise it will cause full table scanning, for example: select id from t where num in (, 3) for continuous values, if you can use between, do not use in: select id from t where num between 1 and 3

D. The following query will also cause a full table scan: select id from t where name like '% abc %'

E. If parameters are used in the where clause, full table scan is also performed. Because SQL parses local variables only at runtime, the optimizer cannot postpone the selection of the access plan to runtime; it must be selected at compilation. However, if an access plan is created during compilation, the value of the variable is still unknown and thus cannot be used as an input for index selection. For example, the following statement will scan the entire table: select id from t where num = @ num can be changed to force query using index: select id from t with (index name )) where num = @ num

F. Avoid performing expression operations on fields in the where clause as much as possible, which will cause the engine to discard the use of indexes for full table scanning. For example, select id from t where num/2 = 100 should be changed to: select id from t where num = 100*2

G. Avoid performing function operations on fields in the where clause as much as possible, which will cause the engine to stop using the index for full table scanning. For example, select id from t where substring (name, 1, 3) = 'abc'-name id that starts with abc select id from t where datediff (day, createdate, '2017-11-30 ') = 0-'2017-11-30' the generated id should be changed: select id from t where name like 'abc % 'select id from t where createdate> = '2017-11-30' and createdate <'2017-12-1 ′

H. do not perform functions, arithmetic operations, or other expression operations on the left side of "=" in the where clause. Otherwise, the system may not be able to correctly use indexes.

I. do not write meaningless queries. If you need to generate an empty table structure: select col1, col2 into # t from t where 1 = 0, this type of code will not return any result set, but it will consume system resources. You should change it to the following: create table # t (...)

J. in many cases, replacing in with exists is a good choice: select num from a where num in (select num from B) with the following statement: select num from a where exists (select 1 from B where num =. num)

K. Do not use select * from t anywhere, replace "*" with a specific field list, and do not return any fields that are not used.

L. Avoid using a cursor whenever possible because the efficiency of the cursor is poor. If the cursor operation has more than 10 thousand rows of data, you should consider rewriting.

M. Avoid returning a large amount of data to the client as much as possible. If the data volume is too large, consider whether the corresponding requirements are reasonable.

N. Avoid large transaction operations as much as possible to improve the system concurrency capability.

3) java: Key Content

A. Try to create as few objects as possible.

B. properly align the system design. A large amount of data operations must be separated from a small amount of data operations. A large amount of data operations are certainly not implemented by the ORM framework .,

C. Use jDBC to connect to a database for Data Operations

D. Control the memory and let the data stream get up, instead of reading all the memory for processing, but reading and processing;

E. make proper use of memory and cache some data

How can we optimize databases and improve database performance?

Answer:

1) disk and network throughput are the most likely impact of hardware performance adjustment. The solution is to expand the virtual memory and ensure sufficient space for expansion; shut down unnecessary services on the database server, separate the database server from the primary domain server, increase the throughput of the SQL database server to the maximum, and run SQL on a machine with more than one processor.

2) Adjust the database

If the query frequency for this table is relatively high, an index is created. When an index is created, you need to create an index based on the where selection conditions for all query operations on the table, try to create an index for an integer key with only one cluster set. The data is physically ordered on the data page to shorten the search range, to create a non-clustered index for all columns that are frequently used in queries, the query can be overwritten. However, the index cannot be too many, execute the update delete insert statement to maintain a sharp increase in the sales volume of these indexes; avoid having too many index keys in the index; avoid using column indexes of large data types; make sure that each index key value has a few rows.

3) use stored procedures

During the implementation of applications, database operations that can be implemented through stored procedures should be implemented through stored procedures as much as possible, because the stored procedure is a one-time design, encoding, testing, and re-use stored on the database server, the application that needs to execute this task can simply execute the stored procedure, in addition, only the result set or value is returned, which not only modularize the program, but also increases the response speed, reduces network traffic, and receives input through input parameters, this ensures logical consistency in applications.

4) application structure and Algorithms

Creating a query Condition Index is only a prerequisite for improving the speed, and improving the response speed depends on the use of indexes. Because people

When using SQL statements, you may often get into a misunderstanding, that is, if you are too concerned about the correct results, especially when you operate a database with a small amount of data, whether to create an index or use an index does not respond quickly to the program. Therefore, when writing a program, the programmer ignores the possible performance differences between different implementation methods, this performance difference is particularly evident in large data volumes or large or complex database environments (such as online transaction processing (OLTP) or decision support system DSS. In practice, it is found that bad SQL statements often come from inappropriate index design, unfilled join conditions, and unoptimized where clauses. After they are properly optimized, their running speed is significantly improved!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.