English original |
Https://devcenter.heroku.com/articles/p Ostgresql-concurrency#how-mvcc-works. |
translate |
piglei |
One of the big selling points of the Postgre database is the way it handles concurrency. Our expectations are simple: reading never blocks writing, and vice versa. Postgres does this through a mechanism called Multi-version concurrency control (MVCC) . This technique is not unique to Postgres: There are several types of databases that implement different forms of MVCC, including Oracle, Berkeley DB, CouchDB , and so on . When you use PostgreSQL to design high-concurrency applications, it's important to understand how MVCC is implemented. It is in fact a very elegant and simple solution to complex problems.
How MVCC works
in Post In Gres, each transaction is given a transaction ID called XID . The business here is not just the begin-commit
for example , when you insert a row of records, Postgre stores the XID of the current transaction in this line and calls it xmin . Only those * submitted and xmin ' commit Span style= "font-family: Microsoft Jacob Black, ' Microsoft Yahei ';" > ), this line of records you inserted is never visible to other transactions. After the commit, other new transactions are created to see the new record, because they meet the condition, and the transaction that created which row of records has also been completed.
for DELETE and the UPDATE , the mechanism is similar, but the difference is that for them postgres use is called Xmax value to determine the visibility of the data. This image shows how MVCC works in transaction isolation in two concurrent transactions that insert/read data.
In the diagram below, let's say we first execute this build statement:
CREATE TABLE Numbers (value int);
Although xmin and the Xmax values are hidden in daily use, but you can request them directly, and Postgres will gladly give you the values:
SELECT *, xmin, xmax from numbers;
Getting the XID of the current transaction is also straightforward:
SELECT txid_current ();
Neat!
I know what you're thinking: What happens if you have two transactions that modify the same row of data at the same time? This is the time for the transaction isolation level (transaction isolation levels) to debut. Postgres supports two basic models to give you control over what you should do with such a situation. By default, use theRead submitted (COMMITTED), waits for the initial transaction to complete before reading the row record and then executes the statement. If the record has been modified during the wait, it will start over again. As an example, when you execute a line with aWHEREclause ofUPDATEwhen theWHEREclause Returns the result of a hit record after the initial transaction is committed, ifWHEREif the conditions of the clauses are still satisfied,UPDATEwill be executed. In the following example, two transactions simultaneously modify the same row of records, the initialUPDATEstatement causes the second transaction to have aWHERENo records are returned, so the second transaction is not modified at all to any records:
If you need to control this behavior better, you can set the transaction isolation level to Serializable (SERIALIZABLE) . Under this strategy, the above scenario will fail directly because it follows the rule: "If the row I am modifying is modified by another transaction, I will not try again" and Postgres will return the error message: serial access is not possible due to concurrent modifications . Capturing this error and then retrying is what your app needs to do, or not just give it a try, if that's reasonable.
Disadvantages of MVCC
now you've Know how MVCC and transaction isolation work, and you've got another tool to solve this type of problem: will come in handy sooner or later. However, although MVCC's merits are obvious, there are some drawbacks.
because different transactions will see records of different states, postgres even those data that may expire will need to be retained. That's why UPDATE actually creates a new row and Delete does not really delete the record (it simply marks the record as deleted and sets the XID value). When the transaction is complete, there are some records in the database that will never be visible to future transactions. They are called dead rows. Another problem with MVCC is that the ID of the transaction can only increase continuously-it is 32 bits and can only "support about 4 billion transactions." When the XID reaches its maximum value, it will change back to 0 and start again. All of a sudden, all the records have been created in the future, and all new transactions have no way of accessing the old records.
The dead row and transaction XID loop issues mentioned above are addressed by the Vacuum command (the command that Postgres uses to perform cleanup operations). This should be a routine maintenance, so Postgre comes with the auto_vacuum daemon automatically cleans up within a configurable cycle. It is important to pay attention to point auto_vacuum, because the cycles that need to be cleaned up in different deployment environments are different. You can find more instructions on vacuum in Postgres's documentation.
The MVCC of PostgreSQL