PostgreSQL Tutorial (Eight): Index detailed

PostgreSQL Tutorial (Eight): Index detailed _postgresql

Last Update:2017-01-18 Source: Internet

Author: User

Tags constant create index postgresql

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Type of index:

PostgreSQL provides a variety of index types: b-tree, Hash, Gist, and gin, because they use different algorithms, so each index type has its appropriate query type, and by default the CREATE INDEX command creates a B-tree index.

1. B-tree:

code as follows:

CREATE TABLE Test1 (
ID Integer,
Content varchar
);
CREATE INDEX Test1_id_index on test1 (ID);

The B-tree index is used primarily for equals and range queries, especially when the index column contains the operator <, <=, =, >=, and > as the query criteria, and the PostgreSQL query planner considers using the B-tree index. PostgreSQL can also use B-tree indexes in queries that use between, in, are null, and are NOT null. However, for queries based on pattern-matching operators, like, ILIKE, ~, and ~*, the index takes effect only if there is a constant in the pattern, and the constant is at the beginning of the pattern string, such as Col like ' foo% ' or col ~ ' ^foo ', otherwise a full table scan will be performed, such as: col Like '%bar '.

2. Hash:

code as follows:

CREATE INDEX name on table USING hash (column);

Hash (hash) index can only handle simple equals comparison. The Query planner considers using a hash index when the index column is compared using the equals operator.
The extra note here is that the performance of the PostgreSQL hash index is not stronger than the B-tree index, but the size and construction time of the hash index is even worse. In addition, because the hash index operation does not currently record the Wal log, once a database crash occurs, we will have to rebuild the hash index with Reindex.

3. GiST:
The gist index is not a separate index type, but rather a schema that enables many different indexing strategies on that architecture. This allows the gist index to use a specific operator type based on different indexing policies.

4. GIN:
The gin index is an inverted index that can handle values (such as arrays) that contain multiple keys. Like Gist, Gin also supports user-defined indexing policies, allowing gin indexes to use specific operator types based on different indexing policies. As an example, PostgreSQL's standard publication includes the gin operator types for one-dimensional arrays, such as <@, @>, =, &&, and so on.

Second, composite index:

Indexes in PostgreSQL can be defined on multiple fields in a datasheet, such as:

code as follows:

CREATE TABLE Test2 (
Major int,
Minor int,
Name varchar
}
CREATE INDEX Test2_mm_idx on Test2 (major, minor);

In the current release, only B-tree, gist, and gin support a composite index, where you can declare up to 32 fields.
1. B-tree type of composite index:
In a composite index of type B-tree, any subset of the indexed field can be used for query criteria, but the maximum efficiency can be achieved only if the first index field (leftmost) in the composite index is included.

2. Gist type of composite index:
In a composite index of type gist, only when the first index field is included in the query criteria can you decide how much index data the query will scan, and the criteria on other indexed fields only limit the entries returned by the index. If most of the data on the first index field has the same key value, then applying the gist index will be less efficient.

3. Gin type of composite index:
Unlike the B-tree and gist indexes, gin Composite indexes are not affected by which subset of indexed fields are used in the query criteria, and the same efficiency is achieved regardless of the combination.

Use of composite indexes should be cautious. In most cases, the index on a single field is sufficient and saves time and space. Indexes with more than three fields are of little use unless the table's usage patterns are very fixed.

Three, combine multiple indexes:

PostgreSQL can combine multiple indexes (including multiple uses of the same index) at query time to handle situations where a single index scan is not possible. At the same time, the system can also form conditions for and and or between multiple index scans. For example, a query that resembles a where x = or x = or x = 99 can be decomposed into four separate scans based on X-field indexes, each using a single query clause, and then the scan results are OR together to produce the final result. Another example is that if we have separate indexes on X and y, then a query similar to where x = 5 and y = 6 is scanned based on the indexes of the two fields, followed by an and operation of the results of each scan, and the resulting row is generated.

To combine multiple indexes, the system scans each required index and then organizes a bitmap in memory, which gives the physical location of the data scanned by the index in the datasheet. Then, depending on the needs of the query, the bitmaps are manipulated with and or or, and the final bitmap are obtained. Finally, the data table is retrieved and the data row is returned. The data rows of the table are accessed in a physical order, because this is the layout of the bitmap, which means that any sort of the original index will disappear. If there is an ORDER BY clause in the query, there will also be an extra sort step. For this reason, and each additional index scan adds extra time, the planner sometimes chooses to use a simple index scan, even if multiple indexes are available.

Four, unique index:

Currently, only B-tree indexes can be declared as unique indexes.

code as follows:

CREATE UNIQUE INDEX name on table (column [, ...]);

If the index is declared as a unique index, then multiple rows with the same index values are not allowed. We believe that null values are not equal to each other.

v. Expression index:

An expression index is used primarily to compare the results of a function or expression based on a field in a query condition with other values, such as:

code as follows:

SELECT * from Test1 WHERE lower (col1) = ' value ';

At this point, if we just indexed on the Col1 field, the query would not use the index when it was executed, but it would simply scan the entire table. If the table has a large amount of data, it will take a long time to execute the query. The solution to this problem is very simple, to establish an expression index based on the Col1 field on the Test1 table, such as:
CREATE INDEX test1_lower_col1_idx on Test1 (Lower (col1));
If we declare the index to be unique, it will prevent the creation of data rows that have a col1 value except that the case is different, and that the col1 values are exactly the same. Therefore, an index on an expression can be used to enforce a constraint that cannot be defined as a simple, unique constraint. Now let's look at an example of an application that uses an expression index.

code as follows:

SELECT * from People WHERE (first_name | | ' ' || last_name) = ' John Smith ';

As with the example above, although we might create separate indexes for first_name and last_name, or a composite index based on the two fields, these indexes are not used when the query is executed, and the query can use only the index of the expression we created below.

code as follows:

CREATE INDEX people_names on people (First_Name | | ' ' || last_name));

The syntax of the CREATE Index command usually requires the writing of parentheses around an indexed expression, as we have shown in the second example. If the expression is just a function call, it can be omitted, as we have shown in the first example.

Indexed expressions are relatively inefficient from the point of view of index maintenance, because when you insert data or update data, you must evaluate the result of the expression for that row and store the result directly into the index. However, when querying, PostgreSQL sees them as where idxcol = ' constant ', so the speed of the search is equivalent to a query based on a simple index. In general, we should just use an expression index in a scenario where the retrieval speed is more important than the insertion and update speed.

vi. Partial index:

A partial index (partial index) is an index built on a subset of a table that is defined by a conditional expression (called a predicate with partial indexes). The index contains only those rows in the table that satisfy the predicate.
Because it is not necessary to update the index in all cases, some indexes improve the efficiency of data insertion and data updating. However, because some indexes are smaller than normal indexes, it is better to improve the efficiency of queries that really require an index portion. See the following three examples:
1. Index and predicate conditional fields are consistent:

code as follows:

CREATE INDEX Access_log_client_ip_ix on Access_log (CLIENT_IP)
WHERE not (client_ip > inet ' 192.168.100.0 ' and Client_ip < inet ' 192.168.100.255 ');

The following query will use this partial index:

code as follows:

SELECT * from access_log WHERE url = '/index.html ' and client_ip = inet ' 212.78.10.32 ';

The following query will not use this partial index:
A query that cannot use this index can be:

code as follows:

SELECT * from Access_log WHERE client_ip = inet ' 192.168.100.23 ';

2. Inconsistent index and predicate condition fields:
PostgreSQL supports partial indexes with arbitrary predicates, and the only constraint is that the fields of the predicate also come from the same data table. Note that if you want your query statement to be able to use a partial index, the conditional part of the query must exactly match the predicate of the partial index. To be exact, a partial index can be used for a query only if the PostgreSQL can recognize that the query's where condition is mathematically covering the predicate of that index.

code as follows:

CREATE INDEX orders_unbilled_index on orders (ORDER_NR) WHERE billed isn't true;

The following query is sure to use this partial index:

code as follows:

SELECT * from to Orders WHERE billed is not true and ORDER_NR < 10000;

What about the following query?

code as follows:

SELECT * from to Orders WHERE billed is not true and amount > 5000.00;

This query will not be as efficient as the above query, after all, the query criteria statement does not use the indexed field, but the query condition "billed is not true" does exactly match the partial indexed predicate, so PostgreSQL scans the entire index. This allows the query to be more efficient only if the index data is relatively small.

The following query will not use a partial index.

code as follows:

SELECT * FROM orders WHERE order_nr = 3501;

3. Uniqueness constraints for subsets of data tables:

code as follows:

CREATE TABLE Tests (
Subject text,
Target text,
Success Boolean,
...
);
CREATE UNIQUE INDEX Tests_success_constraint on tests (subject, target) WHERE success;

This partial index will only constrain data that is true for the success field value. In practical applications, this implementation will be very efficient if there are fewer successful data and less successful data.

Seven, check the use of the index:

See the following four recommendations:
1. Always run analyze first.
This command will collect statistics on the distribution of the values in the table. This information is required when estimating the number of rows returned by a query, and the planner requires this number of rows to give the actual cost value to each possible query plan. If you lack any real statistics, you'll use some default values, which is definitely inaccurate. Therefore, if you have not run analyze to check the usage of an index, it will be a failed check.
2. Use real data to do experiments.
Populating a data table with test data, the index of the table will only evaluate how to use the index based on the test data, rather than all of the data. For example, choose 1000 rows from 100000 rows, and the planner might consider using an index, so if you choose 1 rows from 100 rows, it's hard to use the index. Since 100 rows of data are most likely stored on a disk page, no query planning can be more efficient than sequential access to a disk page. At the same time, when simulating test data, it is important to note that if these data are very similar data, completely random data, or data inserted in the sort order, the statistics will deviate from the characteristics that the actual data should have.
3. If the index is not used, it may be of some value to force its use in the test. There are run-time parameters that can turn off a variety of query plans.
4. Mandatory use of indexing will result in two possibilities: First, the system selection is correct, the use of the index is not really appropriate, and the second is that the cost of the query plan does not reflect the actual situation. This way you should be able to use and not use the index of the query to timing, this time explain analyze command is very useful.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More