PostgreSQL tutorial (14): database maintenance, postgresql tutorial

Source: Internet
Author: User
Tags disk usage

PostgreSQL tutorial (14): database maintenance, postgresql tutorial

I. Disk Space Restoration:

In PostgreSQL, the data rows deleted or updated using the delete and update statements are not actually deleted, the physical address of the old data row sets the row status to deleted or expired. Therefore, when data in a data table changes frequently, the space occupied by the table will become large after a period of time, but the data volume may not change much. To solve this problem, you need to regularly perform VACUUM operations on data tables with frequent data changes.

The vacuum command has two forms: VACUUM and vacuum full. The differences between them are shown in the following table:

  No VACUUM VACUUM VACUUM FULL
After a large amount of data is deleted Only the deleted data is set to deleted, and the space cannot be retained. If the deleted record is at the end of the table, the space occupied by the deleted record will be physically released and returned to the operating system. If it is not the end data, this command will reset the space occupied by the deleted data in the specified table or index to the available state. When new data is inserted in the future, the new disk page will be used first until all reused space is used up. Regardless of whether the deleted data is at the end of the data table, the space occupied by the data will be physically released and returned to the operating system. When new data is inserted, a new disk page is allocated for use.
Execution efficiency   Because the status is set to operation, the efficiency is high. In the current version of PostgreSQL (v9.1), this command will re-generate a data file for the specified table or index, and import the available data in the original file to the new file, and then delete the original data file. Therefore, during the import process, more disk space is required for this operation. Therefore, the command execution efficiency is relatively low.
Whether the physical space occupied by the deleted data is rescheduled to the operating system. No No Yes
Whether other operations on the table can be performed concurrently when the vacuum command is executed.   Because this operation is a shared lock, it can be performed in parallel with other operations. Because this operation requires the exclusive lock to be applied to the specified table, during this operation, any operations based on this table will be suspended and the operation is completed.
Recommended usage During data clearing, you can use the truncate operation, because this operation will physically clear the data table and directly return the occupied space to the operating system. To ensure that the number of disk pages of a data table can be kept at a relatively stable value, you can perform this operation on a regular basis, such as the time period when data operations are relatively small on a daily or weekly basis. Considering the overhead of this operation and rejection of other errors, we recommend that you regularly monitor tables with large data volume changes. Only when you confirm that their disk page shares are close to the critical value, to perform this operation. Even so, you need to select a time period with few data operations as much as possible to complete the operation.
Efficiency of other operations after execution For queries, because there are a large number of disk page fragments, the efficiency will gradually decrease. It is more efficient than not performing any VACUUM operations, but the insertion efficiency is reduced. After this operation is completed, the efficiency of all operations based on this table will be greatly improved.

Ii. Update planner statistics:

When selecting the optimal path, the PostgreSQL query planner should use the statistical information of relevant data tables to generate the most reasonable plan for the query. These statistics are obtained through the ANALYZE command. You can call this command directly or use it as an optional step in the VACUUM command, such as vacuum anaylyze table_name, this command runs VACUUM and then ANALYZE. As with VACUUM, tables with frequent data updates are frequently analyzed, so that the statistical information of the table is always in a relatively new state, in this way, the query optimization based on the table is extremely advantageous. However, you do not need to perform this operation for data tables that are not updated frequently.

We can run the ANALYZE command for a specific table or even a specific field in the table, so that we can only perform the ANALYZE operation on frequently updated information according to the actual situation, this not only saves the space occupied by the statistical information, but also improves the execution efficiency of this ANALYZE operation. Here we need to note that ANALYZE is a very fast operation, even on tables with a large amount of data, because it uses a statistically random sampling method for row sampling, instead of reading and analyzing each row of data. Therefore, you can consider regularly executing this command on the entire database.

In fact, we can even use the following command to adjust the sampling rate of a specified field, for example:
 Copy codeThe Code is as follows:
Alter table testtable alter column test_col set statistics 200
 
Note: The value range is 0--1000. The lower the value, the lower the sampling ratio, and the lower the accuracy of the analysis results, but the faster the execution of the ANALYZE command. If this value is set to-1, the sampling ratio of this field will be restored to the current default sampling value. We can use the following command to obtain the default sampling value of the current system.
 Copy codeThe Code is as follows:
Postgres = # show default_statistics_target;
Default_statistics_target
---------------------------
100
(1 row)
 
From the above results, we can see that the default sampling value of the database is 100 (10% ).

Iii. Examples of VACUUM and ANALYZE:

Copy codeThe Code is as follows:
#1. Create a test data table.
S = # create table testtable (I integer );
CREATE TABLE
#2. Create an index for the test table.
S = # create index testtable_idx ON testtable (I );
CREATE INDEX
#3. Create a function to insert test data in batches.
S = # create or replace function test_insert () returns integer AS $
DECLARE
Min integer;
Max integer;
BEGIN
Select count (*) INTO min from testtable;
Max: = min + 10000;
FOR I IN min .. max LOOP
Insert into testtable VALUES (I );
End loop;
RETURN 0;
END;
$ LANGUAGE plpgsql;
CREATE FUNCTION
#4. insert data in batches to the test table (executed four times)
S = # SELECT test_insert ();
Test_insert
-------------
0
(1 row)
#5. confirm that all the four batch inserts are successful.
S = # select count (*) FROM testtable;
Count
-------
40004
(1 row)
#6. Analyze the test table to update the statistical information of the table to the PostgreSQL system table.
Postgres = # ANALYZE testtable;
ANALYZE
#7. view the number of pages currently occupied by the test table and index (usually 8 k for each page ).
Postgres = # SELECT relname, relfilenode, relpages FROM pg_class WHERE relname = 'testtable' or relname = 'testtable _ idx ';
Relname | relfilenode | relpages
--------------- + ------------- + ----------
Testtable | 17601 | 157
Testtable_idx| 17604 | 90
#8. Batch delete data.
S = # delete from testtable WHERE I <30000;
DELETE 30003
#9. Execute vacuum and analyze to update the system table and mark the table and index records with high water.
#10. Here, we need to note that the data deleted above is located in the front of the data table. If the last part is deleted,
# For example, where I> 10000, the data table is physically reduced when VACUUM ANALYZE is executed.
Postgres = # vacuum analyze testtable;
ANALYZE
#11. view the results after the test table and index are deleted, and then update the system statistics through vacuum analyze (unchanged ).
Postgres = # SELECT relname, relfilenode, relpages FROM pg_class WHERE relname = 'testtable' or relname = 'testtable _ idx ';
Relname | relfilenode | relpages
--------------- + ------------- + ----------
Testtable | 17601 | 157
Testtable_idx| 17604 | 90
(2 rows)
#12. re-insert the table twice, and then analyze the table to update its statistics.
S = # SELECT test_insert (); -- run twice.
Test_insert
-------------
0
(1 row)
Postgres = # ANALYZE testtable;
ANALYZE
#13. Now we can see that the number of pages in the data table is still the number of the previous high-water mark, and the number of index pages is increased.
# It is related to its internal implementation method, but the number of pages occupied by the index will not continue to increase in subsequent inserts.
Postgres = # SELECT relname, relfilenode, relpages FROM pg_class WHERE relname = 'testtable' or relname = 'testtable _ idx ';
Relname | relfilenode | relpages
--------------- + ------------- + ----------
Testtable | 17601 | 157
Testtable_idx | 17604 | 173
(2 rows)
S = # SELECT test_insert ();
Test_insert
-------------
0
(1 row)
Postgres = # ANALYZE testtable;
ANALYZE
#14. We can see that the number of index pages does not continue to increase.
Postgres = # SELECT relname, relfilenode, relpages FROM pg_class WHERE relname = 'testtable' or relname = 'testtable _ idx ';
Relname | relfilenode | relpages
--------------- + ------------- + ----------
Testtable | 17601 | 157
Testtable_idx | 17604 | 173
(2 rows)
#15. delete data in batches again.
S = # delete from testtable WHERE I <30000;
DELETE 19996
#16. The following query shows the number of pages used by the test table and index after the vacuum full command is executed.
# Actually reduced, indicating that the physical space occupied by them has been reduced.
Postgres = # vacuum full testtable;
VACUUM
Postgres = # SELECT relname, relfilenode, relpages FROM pg_class WHERE relname = 'testtable' or relname = 'testtable _ idx ';
Relname | relfilenode | relpages
--------------- + ------------- + ----------
Testtable | 17602 | 118
Testtable_idx| 17605 | 68
(2 rows)

4. Regular index reconstruction:

In PostgreSQL, it is necessary to regularly REINDEX data tables with frequent data updates. For B-Tree indexes, only the index pages that have been completely cleared will be reused, if most index key values on a page are deleted and only a small portion is left, the page will not be released and reused. In this extreme case, because the utilization of each index page is extremely low, once the data volume increases significantly, the index file will become extremely large, which not only reduces the query efficiency, there is also a risk that the entire disk space is fully filled.
There is another performance advantage for the re-built index, because in the new index, logically connected pages are physically connected, this increases the probability that the disk page is continuously read, and improves the IO efficiency of the entire operation. See the following example:
#1. At this time, about 60 thousand data entries have been inserted in the table. The following SQL statement queries the disk space occupied by the index.
 Copy codeThe Code is as follows:
Postgres = # SELECT relname, pg_relation_size (oid)/1024 | 'K' AS size FROM pg_class WHERE relkind = 'I' AND relname = 'testtable _ idx ';
Relname | size
---------------- + ------
Testtable_idx | 1240 K
(1 row)
#2. delete a large majority of data in a data table.
S = # delete from testtable WHERE I> 20000;
DELETE 50006
#3. Analyze the table so that subsequent SQL statements can continue to view the space occupied by the index.
Postgres = # ANALYZE testtable;
ANALYZE
#4. From the query results, we can see that the space occupied by the index is not reduced, but is exactly the same as before.
Postgres = # SELECT pg_relation_size ('testtable _ idx')/1024 | 'K' AS size;
Size
------
1240 K
(1 row)
#5. re-create the index.
Postgres = # reindex index testtable_idx;
REINDEX
#6. Check the space occupied by the re-built index. The result shows that the index size has been reduced.
Postgres = # SELECT pg_relation_size ('testtable _ idx')/1024 | 'K' AS size;
Size
------
368 K
(1 row)
#7. The last thing to remember is that data tables must be analyzed after index reconstruction.
Postgres = # ANALYZE testtable;
ANALYZE

5. Observe disk usage:

1. view the number of disk pages occupied by the data table.
 Copy codeThe Code is as follows:
# Relpages can only be updated by VACUUM, ANALYZE, and several DDL commands, such as create index. Generally, a page is 8 kb in length.
Postgres = # SELECT relfilenode, relpages FROM pg_class WHERE relname = 'testtable ';
Relfilenode | relpages
------------- + ----------
16412 | 79
(1 row)

2. view the index name of the specified data table and the number of disk pages occupied by the index.
 Copy codeThe Code is as follows:
Postgres = # SELECT c2.relname, c2.relpages FROM pg_class c, pg_class c2, pg_index I
WHERE c. relname = 'testtable' AND c. oid = I. indrelid AND c2.oid = I. indexrelid
Order by c2.relname;
Relname | relpages
--------------- + ----------
Testtable_idx | 46
(1 row)
 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.