PostgreSQL learning Manual (6) Indexing

Source: Internet
Author: User

I. Index type:
PostgreSQL provides a variety of index types: B-tree, hash, GIST, and gin. Because they use different algorithms, each index type has a suitable Query type, saving time and resources, the create Index Command creates a B-tree index.

1. B-tree:
Create Table test1 (
Id integer,
Content varchar
);
Create IndexTest1_id_indexOnTest1 (ID );
B-tree indexes are mainly used for equals and range queries, especially when the index column contains the operator "<, <=, =,> =, and>" as the query condition, the query planner of postgresql considers the use of B-tree indexes. PostgreSQL can also use B-tree indexes for queries using between, in, is null, and is not null. However, for queries based on pattern matching operators, such as like, Ilike, and ,~ And ~ * Only when the mode has a constant and the constant is at the beginning of the mode string, for example, Col like 'foo % 'or Col ~ '^ Foo', the index will take effect. Otherwise, a full table scan will be performed, for example, Col like' % bar '.

2. Hash:
Create IndexNameOnTableUsing hash(Column );
Hash indexes can only process simple equals comparison. When an index column is compared with an equal operator, the query planner considers using a hash index.
It should be noted that PostgreSQL does not have a better performance than B-tree indexes, but its size and construction time are worse. In addition, as the hash index operation does not currently record Wal logs, once the database crashes, we will have to re-create the hash index with reindex.

3. gist:
GIST index is not a separate index type, but an architecture that can implement many different index policies. This allows gist indexes to use specific operator types based on different index policies.

4. Gin:
Gin indexes are reverse indexes that can process values that contain multiple keys (such as arrays ). Similar to gist, Gin also supports user-defined index policies, so that gin indexes can use specific operators based on different index policies. As an example, the standard release of PostgreSQL contains the gin operator types used for one-dimensional arrays, such as <@, @>, =, and.

Ii. Composite Index:
Indexes in PostgreSQL can be defined on multiple fields in a data table, for example:
Create Table Test2 (
Major int,
Minor int,
Name varchar
}
Create index test2_mm_idx on Test2 (Major, minor );
In the current version, only B-tree, GIST, and gin support compound indexes. Up to 32 fields can be declared.
1. B-tree composite indexes:
In B-tree composite indexes, any subset of the index field can be used for query conditions. However, only when the first index field in the composite index (leftmost) when it is included, the highest efficiency can be obtained.

2. Gist-type composite index:
In gist-type composite indexes, only when the first index field is included in the query condition can we determine how many index data will be scanned by the query, the condition on other index fields only limits the entries returned by the index. If most of the data in the first index field has the same key value, the application of GIST index will be inefficient.

3. Gin-type composite indexes:
Unlike B-tree and gist indexes, Gin composite indexes are not affected by which index field subsets are used in the query conditions. No matter which combination is used, the same efficiency is achieved.

Exercise caution when using composite indexes. In most cases, the index on a single field is enough, saving time and space. Unless the table usage mode is very fixed, the indexes of more than three fields are of little use.

3. Combining Multiple indexes:
PostgreSQL can combine multiple indexes during query (including multiple use of the same index) to handle situations where a single index scan cannot be implemented. At the same time, the system can form and or conditions between multiple index scans. For example, a query similar to where x = 42 or X = 47 or X = 53 or X = 99 can be divided into four independent X-field index-based scans, each scan uses a query clause. Then, these scan results are combined and the final results are generated. In another example, if we have independent indexes on X and Y, then a query similar to where x = 5 and Y = 6, the system scans the indexes based on the two fields, and then performs and on the results of the scans and generates the final result rows.
In order to combine multiple indexes, the system scans each required index and organizes a bitmap in the memory, which gives the physical location of the data scanned by the index in the data table. Then, based on the Query Needs, perform the and or operations on these bitmaps and obtain the final bitmap. Finally, retrieve the data table and return data rows. The data rows in the table are accessed in physical order, because this is the layout of bitmap, which means that the sorting of any original index will disappear. If the query contains an order by clause, there will be an additional sorting step. For this reason, and each additional index scan will add additional time, so that the planner sometimes chooses to use a simple index scan, that is, to make multiple indexes available.


Iv. Unique index:
Currently, only B-tree indexes can be declared as unique indexes.
Create unique indexNameOnTable (column [,...]);
If the index is declared as a unique index, multiple rows with the same index value are not allowed. NULL values are considered unequal.

5. Expression indexes:
Expression indexes are mainly used to compare the results of a function or expression based on a field with other values in a query condition, for example:
Select * From test1 where lower (col1) = 'value ';
At this point, if we only create an index on the col1 field, the query will not use this index during execution, but will directly perform a full table scan. If the data size of the table is large, it will take a long time to execute the query. The solution to this problem is very simple. Create an expression index based on the col1 field on the test1 table, for example:
Create index test1_lower_col1_idx on test1(Lower (col1 ));
If we declare this index as unique, it will prohibit the creation of rows with only case-insensitive col1 values and rows with exactly the same col1 values. Therefore, indexes on expressions can be used to force constraints that cannot be defined as simple and unique constraints. Now let's look at an example of applying expression indexes.
Select * from people where (first_name | ''| last_name) = 'John Smith ';
As in the preceding example, although we may create independent indexes for first_name and last_name, or composite indexes based on these two fields, when executing this query statement, these indexes are not used. Only the expression indexes created below can be used for this query.
Create index people_names on people(First_name | ''| last_name ));
The syntax of the create Index Command usually requires the circular arc around the index expression, as shown in the second example. If the expression is just a function call, it can be omitted, as shown in the first example.
From the perspective of index maintenance, the index expression is relatively inefficient, because when inserting data or updating data, it must calculate the expression result for this row, and store the result directly in the index. However, when querying, PostgreSQL regards them as where idxcol = 'constant', so the search speed is equivalent to the query based on simple indexes. In general, we should only use expression indexes when the search speed is more important than the insert and update speed.

Vi. Partial indexes:
Partial Index(Partial index)Is an index created on a subset of a table, which is defined by a conditional expression (called partial index predicates ). This index only contains the rows in the table that satisfy this predicate.
Because not all cases require updating indexes, some indexes increase the efficiency of data insertion and data update. However, because some indexes are smaller than normal indexes, it can improve the query efficiency of the index. See the following three examples:
1. The index field and the predicate condition field are consistent:
Create IndexAccess_log_client_ip_ixOnAccess_log (client_ip)
WhereNot (client_ip> Inet '192. 168.100.0 'and client_ip <Inet '192. 168.100.255 ');
This part of the index will be used in the following query:
Select * From access_log where url = '/index.html' and client_ip = Inet '212. 78.10.32 ';
The following query will not use this part of the index:
A query that cannot use this index can be:
Select * From access_log where client_ip = Inet '192. 168.100.23 ';

2. The index field and the predicate condition field are inconsistent:
PostgreSQL supports partial indexes with arbitrary predicates. The only constraint is that the fields of the predicates also come from the same data table. Note: If you want your query statement to use partial indexes, the condition part of the query statement must match the predicate of some indexes. To be accurate, this part of the index can be used only when PostgreSQL can identify the where condition of the query that covers the predicates of the index in mathematics.
Create IndexOrders_unbilled_indexOnOrders (order_nr)WhereBilled is not true;
This part of the index will be used in the following query:
Select * from orders where billed is not true and order_nr <10000;
What about the following query?
Select * from orders where billed is not true and amount> 5000.00;
This query is not as efficient as the preceding query. After all, index fields are not used in the query Condition Statement. However, the query condition "billed is not true" completely matches the predicates of some indexes, therefore, PostgreSQL will scan the entire index. In this way, the query can be more effective only when the index data is relatively small.
Some indexes are not used in the following query.
Select * from orders where order_nr = 3501;

3. uniqueness constraints of data table subsets:
Create Table Tests (
Subject text,
Target Text,
Success Boolean,
...
);
Create unique indexTests_success_constraintOnTests (subject, target)WhereSuccess;
This part of the index will only restrict the uniqueness of data with the success field value being true. In practical applications, this implementation method will be very efficient if there is less data to be successful and there is a large amount of data to be unsuccessful.

7. Check the index usage:
See the following four suggestions:
1. Always run analyze first.
This command collects statistics on the numerical distribution in the table. This information is required when estimating the number of rows returned by a query, and the scheduler needs this number of rows to assign a real overhead value to each possible query plan. If there is no real statistical information, some default values will be used, which is definitely inaccurate. Therefore, if you have not run analyze to check the usage of an index, it will be a failed check.
2. Experiment with real data.
If you fill a data table with test data, the index of the table will only evaluate how to use the index based on the test data, rather than all the data. For example, if you select 100000 rows from 1000 rows, the scheduler may consider using indexes. If you select one row from 100 rows, it is hard to say that indexes will be used. Because 100 rows of data may be stored on a disk page, no query plan is more efficient than sequential access to a disk page. At the same time, when simulating test data, you should also note that if the data is very similar data, completely random data, or data inserted in order, will make the statistical information deviate from the actual data should have characteristics.
3. If the index is not used, forcing it to be used in the test may be of some value. Some runtime parameters can disable various query plans.
4. The forced use of indexes may lead to two possible reasons: First, the system is correct and the use of indexes is not suitable. Second, the overhead calculation of the query plan does not reflect the actual situation. In this way, you should timing queries that use or do not use indexes. The explain analyze command is very useful at this time.

 

 

 

Reprinted from Stephen Liu for the purposes of learning to add to favorites only.

I. Index type:
PostgreSQL provides a variety of index types: B-tree, hash, GIST, and gin. Because they use different algorithms, each index type has a suitable Query type, saving time and resources, the create Index Command creates a B-tree index.

1. B-tree:
Create Table test1 (
Id integer,
Content varchar
);
Create IndexTest1_id_indexOnTest1 (ID );
B-tree indexes are mainly used for equals and range queries, especially when the index column contains the operator "<, <=, =,> =, and>" as the query condition, the query planner of postgresql considers the use of B-tree indexes. PostgreSQL can also use B-tree indexes for queries using between, in, is null, and is not null. However, for queries based on pattern matching operators, such as like, Ilike, and ,~ And ~ * Only when the mode has a constant and the constant is at the beginning of the mode string, for example, Col like 'foo % 'or Col ~ '^ Foo', the index will take effect. Otherwise, a full table scan will be performed, for example, Col like' % bar '.

2. Hash:
Create IndexNameOnTableUsing hash(Column );
Hash indexes can only process simple equals comparison. When an index column is compared with an equal operator, the query planner considers using a hash index.
It should be noted that PostgreSQL does not have a better performance than B-tree indexes, but its size and construction time are worse. In addition, as the hash index operation does not currently record Wal logs, once the database crashes, we will have to re-create the hash index with reindex.

3. gist:
GIST index is not a separate index type, but an architecture that can implement many different index policies. This allows gist indexes to use specific operator types based on different index policies.

4. Gin:
Gin indexes are reverse indexes that can process values that contain multiple keys (such as arrays ). Similar to gist, Gin also supports user-defined index policies, so that gin indexes can use specific operators based on different index policies. As an example, the standard release of PostgreSQL contains the gin operator types used for one-dimensional arrays, such as <@, @>, =, and.

Ii. Composite Index:
Indexes in PostgreSQL can be defined on multiple fields in a data table, for example:
Create Table Test2 (
Major int,
Minor int,
Name varchar
}
Create index test2_mm_idx on Test2 (Major, minor );
In the current version, only B-tree, GIST, and gin support compound indexes. Up to 32 fields can be declared.
1. B-tree composite indexes:
In B-tree composite indexes, any subset of the index field can be used for query conditions. However, only when the first index field in the composite index (leftmost) when it is included, the highest efficiency can be obtained.

2. Gist-type composite index:
In gist-type composite indexes, only when the first index field is included in the query condition can we determine how many index data will be scanned by the query, the condition on other index fields only limits the entries returned by the index. If most of the data in the first index field has the same key value, the application of GIST index will be inefficient.

3. Gin-type composite indexes:
Unlike B-tree and gist indexes, Gin composite indexes are not affected by which index field subsets are used in the query conditions. No matter which combination is used, the same efficiency is achieved.

Exercise caution when using composite indexes. In most cases, the index on a single field is enough, saving time and space. Unless the table usage mode is very fixed, the indexes of more than three fields are of little use.

3. Combining Multiple indexes:
PostgreSQL can combine multiple indexes during query (including multiple use of the same index) to handle situations where a single index scan cannot be implemented. At the same time, the system can form and or conditions between multiple index scans. For example, a query similar to where x = 42 or X = 47 or X = 53 or X = 99 can be divided into four independent X-field index-based scans, each scan uses a query clause. Then, these scan results are combined and the final results are generated. In another example, if we have independent indexes on X and Y, then a query similar to where x = 5 and Y = 6, the system scans the indexes based on the two fields, and then performs and on the results of the scans and generates the final result rows.
In order to combine multiple indexes, the system scans each required index and organizes a bitmap in the memory, which gives the physical location of the data scanned by the index in the data table. Then, based on the Query Needs, perform the and or operations on these bitmaps and obtain the final bitmap. Finally, retrieve the data table and return data rows. The data rows in the table are accessed in physical order, because this is the layout of bitmap, which means that the sorting of any original index will disappear. If the query contains an order by clause, there will be an additional sorting step. For this reason, and each additional index scan will add additional time, so that the planner sometimes chooses to use a simple index scan, that is, to make multiple indexes available.


Iv. Unique index:
Currently, only B-tree indexes can be declared as unique indexes.
Create unique indexNameOnTable (column [,...]);
If the index is declared as a unique index, multiple rows with the same index value are not allowed. NULL values are considered unequal.

5. Expression indexes:
Expression indexes are mainly used to compare the results of a function or expression based on a field with other values in a query condition, for example:
Select * From test1 where lower (col1) = 'value ';
At this point, if we only create an index on the col1 field, the query will not use this index during execution, but will directly perform a full table scan. If the data size of the table is large, it will take a long time to execute the query. The solution to this problem is very simple. Create an expression index based on the col1 field on the test1 table, for example:
Create index test1_lower_col1_idx on test1(Lower (col1 ));
If we declare this index as unique, it will prohibit the creation of rows with only case-insensitive col1 values and rows with exactly the same col1 values. Therefore, indexes on expressions can be used to force constraints that cannot be defined as simple and unique constraints. Now let's look at an example of applying expression indexes.
Select * from people where (first_name | ''| last_name) = 'John Smith ';
As in the preceding example, although we may create independent indexes for first_name and last_name, or composite indexes based on these two fields, when executing this query statement, these indexes are not used. Only the expression indexes created below can be used for this query.
Create index people_names on people(First_name | ''| last_name ));
The syntax of the create Index Command usually requires the circular arc around the index expression, as shown in the second example. If the expression is just a function call, it can be omitted, as shown in the first example.
From the perspective of index maintenance, the index expression is relatively inefficient, because when inserting data or updating data, it must calculate the expression result for this row, and store the result directly in the index. However, when querying, PostgreSQL regards them as where idxcol = 'constant', so the search speed is equivalent to the query based on simple indexes. In general, we should only use expression indexes when the search speed is more important than the insert and update speed.

Vi. Partial indexes:
Partial Index(Partial index)Is an index created on a subset of a table, which is defined by a conditional expression (called partial index predicates ). This index only contains the rows in the table that satisfy this predicate.
Because not all cases require updating indexes, some indexes increase the efficiency of data insertion and data update. However, because some indexes are smaller than normal indexes, it can improve the query efficiency of the index. See the following three examples:
1. The index field and the predicate condition field are consistent:
Create IndexAccess_log_client_ip_ixOnAccess_log (client_ip)
WhereNot (client_ip> Inet '192. 168.100.0 'and client_ip <Inet '192. 168.100.255 ');
This part of the index will be used in the following query:
Select * From access_log where url = '/index.html' and client_ip = Inet '212. 78.10.32 ';
The following query will not use this part of the index:
A query that cannot use this index can be:
Select * From access_log where client_ip = Inet '192. 168.100.23 ';

2. The index field and the predicate condition field are inconsistent:
PostgreSQL supports partial indexes with arbitrary predicates. The only constraint is that the fields of the predicates also come from the same data table. Note: If you want your query statement to use partial indexes, the condition part of the query statement must match the predicate of some indexes. To be accurate, this part of the index can be used only when PostgreSQL can identify the where condition of the query that covers the predicates of the index in mathematics.
Create IndexOrders_unbilled_indexOnOrders (order_nr)WhereBilled is not true;
This part of the index will be used in the following query:
Select * from orders where billed is not true and order_nr <10000;
What about the following query?
Select * from orders where billed is not true and amount> 5000.00;
This query is not as efficient as the preceding query. After all, index fields are not used in the query Condition Statement. However, the query condition "billed is not true" completely matches the predicates of some indexes, therefore, PostgreSQL will scan the entire index. In this way, the query can be more effective only when the index data is relatively small.
Some indexes are not used in the following query.
Select * from orders where order_nr = 3501;

3. uniqueness constraints of data table subsets:
Create Table Tests (
Subject text,
Target Text,
Success Boolean,
...
);
Create unique indexTests_success_constraintOnTests (subject, target)WhereSuccess;
This part of the index will only restrict the uniqueness of data with the success field value being true. In practical applications, this implementation method will be very efficient if there is less data to be successful and there is a large amount of data to be unsuccessful.

7. Check the index usage:
See the following four suggestions:
1. Always run analyze first.
This command collects statistics on the numerical distribution in the table. This information is required when estimating the number of rows returned by a query, and the scheduler needs this number of rows to assign a real overhead value to each possible query plan. If there is no real statistical information, some default values will be used, which is definitely inaccurate. Therefore, if you have not run analyze to check the usage of an index, it will be a failed check.
2. Experiment with real data.
If you fill a data table with test data, the index of the table will only evaluate how to use the index based on the test data, rather than all the data. For example, if you select 100000 rows from 1000 rows, the scheduler may consider using indexes. If you select one row from 100 rows, it is hard to say that indexes will be used. Because 100 rows of data may be stored on a disk page, no query plan is more efficient than sequential access to a disk page. At the same time, when simulating test data, you should also note that if the data is very similar data, completely random data, or data inserted in order, will make the statistical information deviate from the actual data should have characteristics.
3. If the index is not used, forcing it to be used in the test may be of some value. Some runtime parameters can disable various query plans.
4. The forced use of indexes may lead to two possible reasons: First, the system is correct and the use of indexes is not suitable. Second, the overhead calculation of the query plan does not reflect the actual situation. In this way, you should timing queries that use or do not use indexes. The explain analyze command is very useful at this time.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.