I. Disk Space Restoration:
In PostgreSQL, the data rows deleted or updated using the delete and update statements are not actually deleted, the physical address of the old data row sets the row status to deleted or expired. Therefore, when data in a data table changes frequently, the space occupied by the table will become large after a period of time, but the data volume may not change much. To solve this problem, you need to regularly perform VACUUM operations on data tables with frequent data changes.
The vacuum command has two forms,VACUUMAndVACUUM FULLFor the differences between them, see the following table:
| |
No VACUUM |
VACUUM |
VACUUM FULL |
| After a large amount of data is deleted |
Only the deleted data is set to deleted, and the space cannot be retained. |
If the deleted record is at the end of the table, the space occupied by the deleted record will be physically released and returned to the operating system. If it is not the end data, this command will reset the space occupied by the deleted data in the specified table or index to the available state. When new data is inserted in the future, the new disk page will be used first until all reused space is used up. |
Regardless of whether the deleted data is at the end of the data table, the space occupied by the data will be physically released and returned to the operating system. When new data is inserted, a new disk page is allocated for use. |
| Execution efficiency |
|
Because the status is set to operation, the efficiency is high. |
In the current version of PostgreSQL (v9.1), this command will re-generate a data file for the specified table or index, and import the available data in the original file to the new file, and then delete the original data file. Therefore, during the import process, more disk space is required for this operation. Therefore, the command execution efficiency is relatively low. |
| Whether the physical space occupied by the deleted data is rescheduled to the operating system. |
No |
No |
Yes |
| Whether other operations on the table can be performed concurrently when the vacuum command is executed. |
|
Because this operation is a shared lock, it can be performed in parallel with other operations. |
Because this operation requires the exclusive lock to be applied to the specified table, during this operation, any operations based on this table will be suspended and the operation is completed. |
| Recommended usage |
During data clearing, you can use the truncate operation, because this operation will physically clear the data table and directly return the occupied space to the operating system. |
To ensure that the number of disk pages of a data table can be kept at a relatively stable value, you can perform this operation on a regular basis, such as the time period when data operations are relatively small on a daily or weekly basis. |
Considering the overhead of this operation and rejection of other errors, we recommend that you regularly monitor tables with large data volume changes. Only when you confirm that their disk page shares are close to the critical value, to perform this operation. Even so, you need to select a time period with few data operations as much as possible to complete the operation. |
| Efficiency of other operations after execution |
For queries, because there are a large number of disk page fragments, the efficiency will gradually decrease. |
It is more efficient than not performing any VACUUM operations, but the insertion efficiency is reduced. |
After this operation is completed, the efficiency of all operations based on this table will be greatly improved. |
Ii. Update planner statistics:
When selecting the optimal path, the PostgreSQL query planner should use the statistical information of relevant data tables to generate the most reasonable plan for the query. These statistics are obtained through the ANALYZE command. You can call this command directly or use it as an optional step in the VACUUM command, as shown inVacuum anaylyze table_name, This command will first execute VACUUM and then execute ANALYZE. As with VACUUM, tables with frequent data updates are frequently analyzed, so that the statistical information of the table is always in a relatively new state, in this way, the query optimization based on the table is extremely advantageous. However, you do not need to perform this operation for data tables that are not updated frequently.
We can run the ANALYZE command for a specific table or even a specific field in the table, so that we can only perform the ANALYZE operation on frequently updated information according to the actual situation, this not only saves the space occupied by the statistical information, but also improves the execution efficiency of this ANALYZE operation. Here we need to note that ANALYZE is a very fast operation, even on tables with a large amount of data, because it uses a statistically random sampling method for row sampling, instead of reading and analyzing each row of data. Therefore, you can consider regularly executing this command on the entire database.
In fact, we can even use the following command to adjust the sampling rate of a specified field, for example:
Alter table testtable alter column test_col set statistics 200
Note: The value range is0--1000The lower the value, the lower the sampling ratio, and the lower the accuracy of the analysis results, but the faster the execution of the ANALYZE command. If you set this value-1The sampling ratio of this field is restored to the default sampling value of the system. We can use the following command to obtain the default sampling value of the current system.
Postgres = # show default_statistics_target;
Default_statistics_target
---------------------------
100
(1 row)
From the above results, we can see that the default sampling value of the database is 100 (10% ).