Question of the day: Morgan Stanley | Database

Source: Internet
Author: User

1. Difference between clustered and non clustered Index
Http://space.exue.com/i10691.html
In SQL Server what's the different clustered index and non-clustered index?

A clustered index determines the physical order of data in a table. e Learning zone $ D (l4kf; E % v
For a nonclustered index the data is stored in one place, the index in another, with pointers to the storage location of the data.

Consider using a clustered index:

Columns that contain a large number of distinct values.
Queries that return a range of values using operators such as between, >,>=,<, and <=.
Columns that are accessed sequentially.
Queries that return large result sets.
Columns that are frequently accessed by queries involving join or group by clses; typically these are foreign key columns. an index on the column (s) specified in the order by or group by clause eliminates the need for SQL Server to sort the data because the rows are already sorted. this improves query performance.
OLTP-type applications where very fast single row lookup is required, typically by means of the primary key. Create a clustered index on the primary key

Clustered indexes are not a good choice:

Columns that undergo frequent changes.
Wide keys

Consider using nonclustered indexes:

Columns that contain a large number of distinct values, such as a combination of last name and first name (if a clustered index is used for other columns ). if there are very few distinct values, such as only 1 and 0, most queries will not use the index because a table scan is usually more efficient.
Queries that do not return large result sets.
Columns frequently involved in search conditions of a query (where clause) that return exact matches.
Demo-- support-system applications for which joins and grouping are frequently required. Create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns.
Covering all columns from one table in a given query. This eliminates accessing the table or clustered index altogether. E region /~ 'S # MSU]

Resources:

Using clustered Indexes
Using non-clustered Indexes
 

2. difference between having and where
Haveing group, followed by group.
Where condition
Select column_name
From table_name
Where
Group by column_name
Having
 

Http://book.csdn.net/bookfiles/235/10023510864.shtml

Http://dev.yesky.com/230/2669730.shtml

Before introducing the group by and having clauses, we must first talk about a special function in SQL: aggregate function, such as sum, Count, Max, and AVG. The fundamental difference between these functions and other functions is that they generally work on multiple records.

Select sum (Population) from BBC

Sum is used in the population field of all returned records. The result is that only one result is returned for this query, that is, the total population of all countries.

By using the group by clause, sum and count functions can be used for a group of data. When you specify group by region, only one row of data belonging to the same region can be returned, that is, all fields except region (region) in the table, only one value can be returned after sum, count, and other aggregate function operations.

Having clause allows us to filter the data of each group after grouping. The WHERE clause filters records before aggregation, that is, before the Group by clause and having clause.
The having clause filters group records after aggregation.

Let's still understand the group by and having clauses through specific instances, and use the BBC table introduced in section 3.

SQL instance:

1. display the total population and total area of each region:

Select region, sum (Population), sum (area)
From BBC
Group by region

First, return records are divided into multiple groups by region, which is the literal meaning of group. After grouping, Aggregate functions are used to calculate different fields (one or more records) in each group.

2. The total population and total area of each region are displayed. Only those regions with an area exceeding 1000000 square meters are displayed.

Select region, sum (Population), sum (area)
From BBC
Group by region
Having sum (area)> 1000000

Here, we cannot use where to filter more than 1000000 of the regions, because such a record does not exist in the table.

In the clauses on, where, and having that can all add conditions, on is the first to execute, where is the second, and having is the last. Sometimes, if the order does not affect the intermediate results, the final results will be the same. However, because on filters out records that do not meet the conditions before making statistics, it can reduce the data to be processed by intermediate operations. It is reasonable to say that the speed is the fastest.
 
According to the above analysis, we can know that the Where should be faster than having, because it only performs sum after filtering data, so having is the slowest. But it doesn't mean having is useless, because sometimes having is required when the record is not known in step 3.
 
The on clause is used only when two tables are joined. Therefore, when a table is joined, the where clause is compared with having. In the case of single-Table query statistics, if the filter condition does not involve fields to be calculated, the results will be the same, but the where technology can be used, having cannot, and the latter must be slow in terms of speed.
 
If a calculated field is involved, it indicates that the value of this field is uncertain before calculation. According to the workflow written in the previous article, the where function is completed before calculation, and having is used after calculation. In this case, the results of the two are different.
 
In multi-table join queries, on takes effect earlier than where. The system first combines multiple tables into a temporary table based on the join conditions between tables, then filters them by where, then computes them, and then filters them by having after calculation. It can be seen that to filter a condition to play a correct role, you must first understand when the condition should take effect, and then decide to put it there.
As an experiment, the where filter cannot contain fields to be calculated.

3. What is Query Optimization

Http://blog.chinaunix.net/u/10080/showart.php? Id = 170107

Query Optimization
Query Optimization
Compared with traditional RDBMS queries, because all the data is in the memory, the query execution is very fast. However, fastdb uses many optimization measures to accelerate query execution: Using indexes, reverse references, and query parallelization. The following sections provide detailed information about these optimizations.
Using indices in queries
Index Used in Query
Index is a traditional method to improve RDBMS performance. Fastdb uses two types of indexes: extensible hash table and T-tree. The first method is to access the record with the specified keyword value at the fastest speed (usually constant time ). T-tree is a mixture of AVL-tree and array. The role of mmrdbms is the same as that of B-tree in traditional RDBMS. Provides logarithm Algorithm Complex search, insert, and delete operations (that is, the time for searching, inserting, or deleting a table with N records is C * log2 (n ), C is a constant ). T-tree is more suitable for mmdbms than B-tree, because B-tree tries to minimize the number of pages to be loaded (for disk-based databases, page loading is expensive ), t-tree tries to optimize the number of comparison/movement operations. T-tree is most suitable for range operations or records in a distinctive sequence.
Fastdb uses simple rules to apply indexes. Program To predict when and which index will be used. The index applicability check is performed during each query execution. Therefore, this decision can be determined by the value of the operand. The following rules describe the fastdb indexing algorithm:
Compiled condition expressions are always checked from left to right
If the final (topmost) expression is "and", the index is used in the left half of the expression, and the right half is used as the filter)
If the final expression is or, you can use the index in the left half, and then test the possibility of using the index in the right half.
In addition, when the following conditions are met, the index applies to expressions
The final expression is a relational operation (= <><=> = between like)
The operand type is Boolean, numeric, string, and reference.
The right operand of an expression is a text constant, a C ++ variable, or
The left operand is the record index field.
Indexes are compatible with relational operations.
Now we should confirm the meaning of "index and operation compatibility" and what types of indexes are used in no case. A hash table can be used in the following situations:
Equal = comparison;
Between operation and the values of the two end-point operands are equal
Like operation and the mode string does not contain special characters ('%' or '_') and no escape characters (specified in the escape Section)
When the hash table is not suitable and the following conditions are met, you can use the T-tree:
Comparison calculation (= <><>=)
Like operation and the mode string contains a non-null prefix (that is, the first character of the mode is not '%' or '_')
If you use an index to search for the prefix of the like expression, and its suffix is not just the '%' character, this index search operation will return more records than the records in the true match mode. In this case, we should filter the index search results that match the pattern ..
If the search condition is the extraction of some subexpressions (many optional expressions connected by the OR operator), the query execution can use multiple indexes. To avoid record duplication, use bitmap in the cursor to mark that the record has been selected.
If the search condition needs to scan the line table and the order by clause contains a single record field defining the T-tree index, you can use the T-tree index. As long as sorting is a very expensive operation, using indexes instead of sorting significantly reduces the query execution time.
Use the-ddebug = debug_trace parameter to compile fastdb. You can check which indexes are used during query execution and many tests performed during index search. In this case, fastdb includes the index tracing information for the dump database operation performance.
Reverse reference
Reverse reference provides an efficient and reliable way to establish relationships between tables. Fastdb uses reverse reference information when inserting, updating, or deleting records, and when querying and optimizing records. The relationships between records can be of these types: one-to-one, one-to-many, and many-to-many.
. The one-to-one relationship is represented by a reference field of itself and the target record.
. One pair is represented by one of its own reference fields and one referenced array field in the target table.
Multiple-to-one is represented by a reference array field and a reference field of the records in the referenced table.
Many-to-many pairs are represented by reference array fields in their own and target records.
When a record with a declared link is inserted into the table, all the reverse references associated with the record in all tables are updated to point to this record. When a record is updated and a field indicating the relationship of the record changes, the reverse reference is automatically restructured, delete the reference of the record that is no longer associated with the updated record, and set the inverse reference of the new record contained in the relationship to the updated record. When a record is deleted, all references to it in the reverse reference field are deleted.
For efficiency reasons, fastdb does not guarantee the consistency of all references. If you delete a record from the table, a reference to the record may still exist in the database. Accessing these references will lead to unexpected results of the Application and even database crashes. This issue can be cleared by using reverse references, because all references are automatically updated and the reference consistency is retained.
The following table is used as an example:

Class contract;

Class detail {
Public:
Char const * Name;
Char const * material;
Char const * color;
Real4 weight;

Dbarray <dbreference <contract> contracts;

Type_descriptor (Key (name, indexed | hashed ),
Key (material, hashed ),
Key (color, hashed ),
Key (weight, indexed ),
Relation (contracts, detail )));
};

Class supplier {
Public:
Char const * company;
Char const * location;
Bool foreign;

Dbarray <dbreference <contract> contracts;

Type_descriptor (Key (company, indexed | hashed ),
Key (location, hashed ),
Field (foreign ),
Relation (contracts, supplier )));
};


Class contract {
Public:
Dbdatetime delivery;
Int4 quantity;
Int8 price;
Dbreference <detail> detail;
Dbreference <supplier> supplier;

Type_descriptor (Key (delivery, hashed | indexed ),
Key (quantity, indexed ),
Key (price, indexed ),
Relation (detail, contracts ),
Relation (supplier, contracts )));
};
In this example, there is a one-to-many relationship between the detail-contract and supplier-contract tables. When a contract record is inserted into the database, you only need to set the reference detail and supplier to the corresponding records of the detail and supplier tables. The reverse reference contracts of these records will be automatically updated. When a contract record is deleted, the reference of the deleted record is automatically excluded from the contracts field of the referenced detail and Supplier Record.
In addition, you can select a more effective plan for query execution using reverse references. Consider the following query and select the shipping details of a company:

Q = "exists I :( contracts [I]. Supplier. Company =", company ,")";
The most direct method for executing this query is to scan the detail table and use this condition to test each record. However, we can use another method for reverse reference: Use the specified company name to index the records in the supplier table, and then use reverse reference to locate records in the detail table, the detail table has a transfer relationship with the selected supplier record. Of course, we need to clear repeated records. This is possible because a company may transport many different goods. This is achieved through the bitmap of the cursor object. Because index search is significantly faster than sequential search and is very fast to access through reference, the total query execution time is much shorter than the direct method.
Starting from version 1.20, fastdb supports cascade deletion. If the owner macro is used to declare a field, the record is treated as the owner of the hierarchy ). When the owner record is deleted, all members of the Link (records referenced by the owner) will be automatically deleted. If the member record of the link must be referenced by the owner record, this field should be macro-declared by relation.

4. What is the difference between clustered index scan and clustered index seek?

This occurs during Show plan.
1. Select au_id
From authors
Where au_id = '2014-56-7008 '(clustered index seek)

2. Select City
From authors
Where city like 'san % '(clustered index scan)
Index seek means that SQL server will traverse the index from the root down to the leaf level, comparing the values in the Sarg to the key values in the index rows to determine which page to look at next. seeking through an Index typically means a root-to-leaf traversal.

A scan, the alternative to index seek, means that SQL server will stay at the leaf level and traverse just that one level of the index from the first page to the last. I like to think of a seek as a vertical traversal of an index and a scan as a horizontal traversal. remember that for a clustered index, the leaf level of the index is the table data itself, so scanning the clustered index really means scanning the table. in fact, the only time you'll see Table scan as an operator in the plan is for heaps.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.