First, restore disk space:
In PostgreSQL, data rows that are deleted or updated using the Delete and UPDATE statements are not actually deleted, but only the state of the row is deleted or expired on the physical address of the old version of the data row. So when the data in the datasheet changes extremely frequently, the space that the table occupies will become very large after a period of time, but the amount of data may change little. To solve this problem, you need to periodically perform vacuum operations on data tables with frequent data changes.
The vacuum command exists in two forms, vacuum and vacuum full, and the difference between them is shown in the following table:
|
No vacuum |
VACUUM |
VACUUM full |
After you delete a large amount of data |
Only the state of the deleted data is deleted, and the space cannot be recorded for reuse. |
If the deleted record is at the end of the table, the space it occupies will be physically released and returned to the operating system. If it is not the end data, the command resets the space occupied by the deleted data in the specified table or index to the available state, so that when new data is inserted in the future, the space will be used preferentially until all reused space is exhausted and the new disk page is considered. |
Regardless of whether the deleted data is at the end of the datasheet, the space occupied by the data will be physically released and returned to the operating system. When new data is inserted, a new disk page is allocated for use. |
Execution efficiency |
|
Because only the state is placed as an operation, the efficiency is high. |
In the current version of PostgreSQL (v9.1), the command regenerates a data file for the specified table or index, imports the available data from the existing file into a new file, and then deletes the original data file. Therefore, during the import process, more space is required for the current disk to be available for this operation. This shows that the execution efficiency of the command is relatively low. |
Whether the physical space occupied by the deleted data is being redesigned to the operating system. |
No |
No |
Yes |
If you execute the vacuum command, you can perform additional operations on the table concurrently. |
|
Because the operation is a shared lock, it can be performed in parallel with other operations. |
Because this operation requires an exclusive lock to be applied on the specified table, any action based on that table will be suspended during the operation, knowing that the operation is complete. |
Recommended mode of Use |
When data is emptied, you can use the truncate operation because it will physically empty the datasheet and return the space it occupies directly to the operating system. |
To ensure that the number of disk pages in a datasheet is maintained at a relatively stable value, you can perform this action on a regular basis, such as a period of relatively small data operations on a daily or weekly basis. |
Given the cost of the operation and the exclusion of other errors, it is recommended that you periodically monitor the tables with large changes in data, and only consider doing this once if you confirm that their disk page occupancy is close to the critical value. Even so, you need to be aware that you can do this by choosing a period of less data manipulation. |
Efficiency of other operations after execution |
For queries, the efficiency is gradually reduced due to the large number of disk page fragmentation. |
It is more efficient than not performing any vacuum operations, but the insertion efficiency is reduced. |
After this operation, all operational efficiencies based on the table are greatly improved. |
II. Update Planner Statistics:
When selecting the optimal path, the PostgreSQL query planner needs to refer to the statistic information of the relevant data table to generate the most reasonable plan for the query. These statistics are obtained by the Analyze command, which you can call directly, or as an optional step in the vacuum command, such as vacuum anaylyze table_name, which will execute vacuum and execute analyze first. As with the Reclaim space (VACUUM), the table with frequent updates is kept in a certain frequency analyze, so that the statistics of the table are always in a relatively new state, which will be extremely advantageous for query optimization based on the table. However, you do not need to perform this action for a data table that is not updated frequently.
We can run the Analyze command for a particular table, or even a specific field in a table, so that we can perform analyze operations on some of the more frequently updated information, so that not only can we save the space occupied by the statistics, Also can improve the execution efficiency of this analyze operation. The extra note here is that analyze is a fairly fast operation, even on large data tables, because it uses a statistically random sampling method for row sampling, rather than reading every row of data and parsing it. Therefore, you can consider executing the command on a regular basis across the database.
In fact, we can even adjust the sample rate for a specified field by using the following command, such as:
Copy Code code as follows:
Alter TABLE TestTable ALTER COLUMN test_col SET STATISTICS 200
Note: The value range is 0--1000, where the lower the sample ratio, the lower the accuracy of the analysis results, but the analyze command executes faster. If the value is set to-1, the sample rate for the field will revert to the system's current default sampling value, and we can obtain the current system's default sample value by using the following command.
Copy Code code as follows:
postgres=# show Default_statistics_target;
Default_statistics_target
---------------------------
100
(1 row)
As you can see from the results above, the default sample value for the database is 100 (10%).
Iii. examples of vacuum and analyze:
Copy Code code as follows:
#1. Create a test data table.
postgres=# CREATE TABLE testtable (i integer);
CREATE TABLE
#2. Creates an index for a test table.
postgres=# CREATE INDEX testtable_idx on testtable (i);
CREATE INDEX
#3. Create a function to bulk INSERT test data.
postgres=# CREATE OR REPLACE FUNCTION test_insert () returns integer as $$
DECLARE
Min Integer;
Max Integer;
BEGIN
SELECT COUNT (*) into the min from testtable;
Max: = min + 10000;
For i in Min.. Max LOOP
INSERT into TestTable VALUES (i);
End LOOP;
return 0;
End;
$$ LANGUAGE Plpgsql;
CREATE FUNCTION
#4. Bulk INSERT data to test table (execute four times)
postgres=# SELECT Test_insert ();
Test_insert
-------------
0
(1 row)
#5. Confirm that the batch insert was successful four times.
postgres=# SELECT COUNT (*) from testtable;
Count
-------
40004
(1 row)
#6. Analyze the test table so that the statistics about the table are updated to the PostgreSQL system tables.
postgres=# ANALYZE testtable;
ANALYZE
#7. View the number of pages currently occupied by the test table and index (typically 8k per page).
postgres=# SELECT Relname,relfilenode, relpages from pg_class WHERE relname = ' testtable ' or relname = ' testtable_idx ';
Relname | Relfilenode | Relpages
---------------+-------------+----------
TestTable | 17601 | 157
Testtable_idx | 17604 | 90
#8. Bulk delete data.
postgres=# DELETE from TestTable WHERE i < 30000;
DELETE 30003
#9. Perform vacuum and analyze to update system tables and record high water markers for that table and index.
#10. What you need to note here is that the data deleted above is in the front of the datasheet, and if you delete the end part,
# as Where I > 10000, the data table will be physically shrunk when the vacuum analyze is executed.
postgres=# VACUUM ANALYZE testtable;
ANALYZE
#11. View the results of the test tables and indexes after they are deleted and then update the system statistics by vacuum analyze (remain unchanged).
postgres=# SELECT Relname,relfilenode, relpages from pg_class WHERE relname = ' testtable ' or relname = ' testtable_idx ';
Relname | Relfilenode | Relpages
---------------+-------------+----------
TestTable | 17601 | 157
Testtable_idx | 17604 | 90
(2 rows)
#12. Re-insert two more times, and then analyze the table to update its statistics.
postgres=# SELECT Test_insert (); --Executed two times.
Test_insert
-------------
0
(1 row)
postgres=# ANALYZE testtable;
ANALYZE
#13. At this point you can see that the number of pages in the datasheet is still the number of high water markers before, and the number of index pages increases
# is related to its internal implementation, but in the subsequent inserts, the number of pages that the index occupies will not continue to increase.
postgres=# SELECT Relname,relfilenode, relpages from pg_class WHERE relname = ' testtable ' or relname = ' testtable_idx ';
Relname | Relfilenode | Relpages
---------------+-------------+----------
TestTable | 17601 | 157
Testtable_idx | 17604 | 173
(2 rows)
postgres=# SELECT Test_insert ();
Test_insert
-------------
0
(1 row)
postgres=# ANALYZE testtable;
ANALYZE
#14. The number of pages that can see the index does not continue to increase.
postgres=# SELECT Relname,relfilenode, relpages from pg_class WHERE relname = ' testtable ' or relname = ' testtable_idx ';
Relname | Relfilenode | Relpages
---------------+-------------+----------
TestTable | 17601 | 157
Testtable_idx | 17604 | 173
(2 rows)
#15. Re-bulk delete data.
postgres=# DELETE from TestTable WHERE i < 30000;
DELETE 19996
#16. As you can see from the following query, the number of pages occupied by the test table and index after executing the vacuum full command
# really lowered, indicating that the physical space they occupy has shrunk.
postgres=# VACUUM full testtable;
VACUUM
postgres=# SELECT Relname,relfilenode, relpages from pg_class WHERE relname = ' testtable ' or relname = ' testtable_idx ';
Relname | Relfilenode | Relpages
---------------+-------------+----------
TestTable | 17602 | 118
Testtable_idx | 17605 | 68
(2 rows)
Four, periodically rebuild the index:
In PostgreSQL, it is necessary to periodically rebuild indexes (REINDEX index) for data tables with frequent data updates. For B-tree indexes, only those index pages that have been completely emptied are reused, and for those index pages that are only partially available, the page will not be reused if most of the index key values in one page are deleted, leaving only a small portion. In this extreme case, because of the extremely low utilization of each index page, once the amount of data is significantly increased, the index file becomes extremely large, reducing query efficiency and the risk of full disk space being completely filled.
There is another performance advantage to the rebuilt index, because logically interconnected pages are often physically connected in the newly built index, which increases the likelihood that the disk pages will be read continuously, thereby increasing the IO efficiency of the entire operation. See the following example:
#1. About 60,000 data has already been inserted into the table, and the following SQL statement queries the disk space occupied by the index.
Copy Code code as follows:
postgres=# SELECT Relname, Pg_relation_size (OID)/1024 | | ' K ' as size from Pg_class WHERE relkind= ' i ' and relname = ' testtable_idx ';
Relname | Size
----------------+------
Testtable_idx | 1240K
(1 row)
#2. Deletes most of the data in a datasheet.
postgres=# DELETE from testtable WHERE i > 20000;
DELETE 50006
#3. Analyze one of the tables so that subsequent SQL statements can continue to view the space occupied by the index.
postgres=# ANALYZE testtable;
ANALYZE
#4. As you can see from the query results, the amount of space occupied by the index is not reduced, but it is exactly the same as before.
postgres=# SELECT pg_relation_size (' testtable_idx ')/1024 | | ' K ' as size;
Size
------
1240K
(1 row)
#5. Rebuild the index.
postgres=# REINDEX INDEX Testtable_idx;
REINDEX
#6. Look at the amount of space that the rebuilt index actually occupies, and you can see from the results that the index has been reduced in size.
postgres=# SELECT pg_relation_size (' testtable_idx ')/1024 | | ' K ' as size;
Size
------
368K
(1 row)
#7. The last thing to remember is that you must analyze the data table after the index is rebuilt.
postgres=# ANALYZE testtable;
ANALYZE
v. Observe disk usage:
1. View the number of disk pages occupied by the datasheet.
Copy Code code as follows:
#relpages只能被VACUUM, analyze, and several DDL command updates, such as Create INDEX. The length of a page is usually 8K bytes.
postgres=# SELECT Relfilenode, relpages from pg_class WHERE relname = ' testtable ';
Relfilenode | Relpages
-------------+----------
16412 | 79
(1 row)
2. View the index name of the specified data table and the number of disk pages occupied by the index.
Copy Code code as follows:
postgres=# SELECT c2.relname, c2.relpages from Pg_class C, Pg_class C2, Pg_index I
WHERE c.relname = ' testtable ' and c.oid = i.indrelid and c2.oid = I.indexrelid
Order BY C2.relname;
Relname | Relpages
---------------+----------
Testtable_idx | 46
(1 row)