One of the big selling points of the Postgre database is the way it handles concurrency. Our expectations are simple: reading never blocks writing, and vice versa. Postgres this through a mechanism called multiple versioning concurrency control (MVCC). This technique is not unique to Postgres: There are several different types of databases that implement various forms of MVCC, including Oracle, Berkeley DB, CouchDB, and so on. When you use PostgreSQL to design high concurrency applications, it is important to understand how the MVCC is implemented. It is actually a very elegant and simple solution to a complex problem.
How MVCC Works
In Postgres, each transaction will get a transaction ID called XID. The transaction is not just a set of statements that are begin-commit wrapped, but also a single insert, update, or DELETE statement. When a transaction begins, Postgrel increments the XID and assigns it to the transaction. Postgres also stores transaction-related information in every row of records in the system, which is used to determine whether a row of records is visible to the current transaction.
For example, when you insert a row of records, Postgre stores the XID of the current transaction in this line and calls it xmin. Only those records that * have been committed and Xmin ' XID smaller than the current transaction are visible to the current transaction. This means that you can start a new transaction and insert a row of records until you commit, and the line you inserted will never be visible to other transactions. After the commit, other new transactions created later can see the new record as they meet the xmin < XID conditions, and the transaction that created which row is completed.
The mechanism is similar for DELETE and UPDATE, but the difference is that they postgres use a value called Xmax to determine the visibility of the data. This diagram shows how MVCC works in transaction isolation in two concurrent inserts/reads of data transactions.
In the figure below, let's say we executed the table statement first:
Copy Code code as follows:
CREATE TABLE Numbers (value int);
Although the values of xmin and Xmax are hidden in everyday use, you can ask them directly, and Postgres will gladly give you the value:
Copy Code code as follows:
SELECT *, xmin, xmax from numbers;
It is also simple to get the XID of the current transaction:
Copy Code code as follows:
Neat!
I know you're thinking now: What happens if two transactions change the same row of data at the same time? This is the time for the transaction isolation level (transaction isolation levels) to debut. Postgres supports two basic models to give you control over how you should handle the situation. By default, Read Committed is used, waiting for the initial transaction to complete before reading the row record and then executing the statement. If the record is modified in the process of waiting, it will start over again. For example, when you execute an UPDATE with a WHERE clause, the WHERE clause returns the record result of the hit after the initial transaction is committed, and the update is executed if the condition of the WHERE clause is still satisfied. In the following example, two transactions modify the same row of records at the same time, and the original UPDATE statement causes the second transaction's WHERE not to return any records, so the second transaction is not modified to any records at all:
If you need to control this behavior better, you can set the transaction isolation level to Serializable (SERIALIZABLE). Under this strategy, the above scenario will fail directly because it follows the rule that "if I am modifying a row that has been modified by another transaction, I will not try again" and Postgres will return an error message that cannot be accessed because of concurrent modifications. Capturing this error and trying again is what your application needs to do, or just give it up without retrying, if that makes sense.
The disadvantage of MVCC
Now that you know how MVCC and transaction isolation work, you've got another tool to solve such problems: SERIALIZABLE transaction isolation levels will come in handy sooner or later. However, although the advantages of MVCC are obvious, there are some drawbacks.
Because different transactions can see records of different states, postgres even the data that may expire needs to be preserved. That's why UPDATE is actually creating a row of records and deleting a record that doesn't really delete (it simply marks the record as deleted and then sets the XID value). When a transaction completes, there are records in the database that will never be visible to future transactions. They are called dead rows. Another problem with MVCC is that the ID of the transaction can only increase continuously-it is 32 bits and can only "support about 4 billion transactions." When XID reaches its maximum value, it changes back to 0 to start again. All of a sudden, all of the records turned out to be the result of a future transaction, and none of the new transactions were able to access the old records.
The dead row and transaction XID loop issues mentioned above are resolved by executing the vacuum command (the command postgres use to perform cleanup operations). This should be a routine maintenance, so Postgre brings the Auto_vacuum daemon to automatically perform cleanup during a configurable cycle. It is important to keep an eye on the auto_vacuum because the cycle of cleanup needs to be performed differently in different deployment environments. You can find out more about vacuum in Postgres's documentation.