Transaction and Isolation levels------"Designing data-intensive Applications" Reading notes 10

Source: Internet
Author: User

The topic for programmers working with databases is that transactions , as a programming model for an application that simplifies access to a database. By using transactions, applications can ignore some of the potential error scenarios and concurrency problems that the database is responsible for processing. Instead of requiring transactions for every application, sometimes weakening transactional guarantees or completely discarding transactions, you can achieve higher performance or higher availability. How do you better understand the transaction and isolation levels in the database? Let's take this article for a chat.

1.ACID

In 1983, Andreas Reuter and Theo H?rder presented four important features in the transaction:

    • atomicity (atomicity):
      In general, atoms refer to things that cannot be broken down into smaller parts. If the write operation is combined into an atomic transaction and the transaction cannot be completed due to an error, the transaction is aborted and the database must discard or revoke any write operations it has made in the transaction. Atomicity simplifies the data model of a database: If a transaction is aborted, the application can ensure that it does not change and therefore can be retried.

    • Consistency (consistency):
      The consistent statement is that the data in the database must always be correct. For example, in an accounting system, the balance of payments must be balanced across all accounts. It is the responsibility of the application to properly define its transactions to maintain consistency. This is not a database guarantee: If you write bad data that violates your invariants, the database will not stop you. Applications may rely on the atomicity and isolation of the database to achieve consistency.

    • Isolation (Isolation):
      When a database is accessed concurrently by multiple clients, you experience concurrency problems if they access the same database records. As shown in the following:

      Isolation means that concurrently executed transactions are isolated from each other, and the database ensures that when a transaction commits, the results are the same as they do, even if they are actually running concurrently.

    • Persistence (Durability):
      Persistence is a promise that once a transaction is successfully committed, any data it writes will not be lost, even if there is a hardware failure or a database crash. In a single-node database, persistence often means that data is written to nonvolatile storage (such as a hard drive or SSD). It usually also needs to write to the log so that it resumes work if the file becomes corrupted. In a distributed database, persistence may mean that the data has been successfully replicated to some nodes.

In several features, isolation is the most important part of DBA focus on database tuning, and next, we'll focus on the isolation of the transaction.

2. Isolation LEVEL

If two transactions do not touch the same data, they can run in parallel safely because neither is dependent on the other data. Concurrency problems occur when one transaction reads data that is modified at the same time by another transaction, or if two transactions attempt to modify the same data at the same time.

Concurrency errors are difficult to detect by testing because this kind of error triggering is accidental and often difficult to reproduce. Concurrency is also difficult to infer, especially in large applications, because developers do not necessarily know that other snippets are accessing the database. So the database hides the concurrency problem of the application developer by providing the isolation of the transaction, shielding the concurrency details of the underlying database, and provides a serialized data model.

There is no free lunch, the Serializable isolation level brings additional performance overhead, so many databases provide some weak isolation levels as a choice to prevent some concurrency problems. So, next, we'll comb through the differences between the different isolation levels.

Read Committed

The most basic isolation level is Read Committed:

    • When you read data from a database, you see only committed data (no dirty reads).
    • When writing to the database, only the submitted data (no dirty writes) is overwritten.
Dirty read:

A transaction has written some data to the database, but the transaction has not been committed or aborted. Another transaction can see uncommitted data, which is called dirty reads . The isolation level of read Committed can prevent dirty reads. So when a transaction commits, the write operations in the transaction are visible to others. As shown in the following:

Dirty write:

A write operation overrides an uncommitted value, which is called dirty write . The isolation level transaction of Read Committed prevents Dirty writes, usually by delaying the write operation until the previous write transaction has been committed or aborted. Dirty writes can result in inconsistent data, as shown in: Alice and Bob are buying the same thing, and the dirty writing causes the final buyer to be Bob, and the invoice is sent to Alice.

Realize:

Read Committed is a very popular isolation level, and the default isolation level for many databases is read Committed.

The database prevents dirty writes by using row-level locks: When a transaction modifies a particular row, it must first obtain a lock on that row. The lock must then be retained until the transaction is committed or aborted. Only one transaction can lock a lock on any given row, and if another transaction is to be written to the same row, you must wait until the first transaction commits or aborts before you can acquire the lock and continue.

Using row-level locks to avoid dirty reads can be costly and easy to find read latencies. Use any other transaction that reads the same row when the transaction is in progress, giving only the old value. The transaction switches to read the new value only when the new value is committed.

Read Repeatable

Read Committed looks like a good isolation level, but it also produces some problems, so let's take a look at the following example: Alice has $1000 in a bank deposit, split on two accounts, and $500 per account. Now, one transaction goes from her account to another account for $100. If she is unfortunately looking at her account balance list at the same time that the transaction is in progress, she may see an account balance before the receipt of the payment (the balance is $500), and the other account after the transfer (the new balance is $400), while the $100 disappears.

The exception that occurs under the Read Committed isolation level is called non-repeatable reading , and we need to find a new solution.

Snapshot isolation

In order to achieve repeatable reading, we need the technology of snapshot isolation.

Each transaction is read from a snapshot of the database, where the transaction sees all the data submitted in the database at the beginning of the transaction. Even though the data is subsequently changed by another transaction, each transaction sees only the old data from a specific point in time. When a transaction can see data from the database, it is frozen at a specific point in time.

The implementation of snapshot isolation typically uses write locks to prevent dirty writes, which means that the written transaction can block the process of writing another transaction to the same object. For snapshot isolation, the database must retain several different commit versions of the data, because various ongoing transactions may require viewing the state of the database at different points in time, a technique known as Multi-version concurrency control (MVCC).

As shown, whenever a transaction writes anything to the database, the data it writes is marked with the transaction ID.

When a transaction is read from the database, the transaction ID is used to determine which data is visible and which is not. When a new version is created each time a value is changed, the database can provide snapshot isolation with a small overhead.

Serializability

Although read repeatable solves the problem of reading data, there is still no way to solve the problem of concurrent writes. Let's take a look at the following example: Hospitals usually have several on-duty doctors at all times and must have at least one doctor on duty. Doctors can adjust their shifts, provided that at least one colleague is on duty in the hospital. Alice and Bob are two doctors on duty today. Both wanted to adjust their shifts, and unfortunately they happened to click the button to cancel the shift at about the same time. What happens next:

Because the isolation level of the database is snapshot isolation, two people check that there are currently two people on duty, so two transactions go to the next stage. Alice thought there was no problem with the leave, and Bob thought it was no problem. Two transactions have been submitted, there is no doctor on duty now, the consistency of the database has been a problem.

serializability is considered to be the strongest isolation level. The database guarantees that if the transactions behave correctly when running separately, they are still correct in the Concurrency Runtime, in other words, the database prevents all possible race conditions. Next we'll talk a little bit about how serializability's isolation level is implemented.

Two-stage lock (2PL)

Database development for decades, the widely used algorithm: two-phase lock (2PL).

    • Transaction a acquires a read lock on the data, and transaction B wants to write the corresponding data, it must be committed or aborted before transaction A can continue the write operation. This ensures that transaction B does not accidentally change the data that transaction a is reading.
    • Transaction a acquires a write lock on the data, transaction B wants to read the corresponding data, and transaction B must wait until transaction A is committed or aborted before it can be read.
    • Transaction a acquires a write lock on the data, transaction B wants to write the corresponding data, and transaction B must wait until transaction a commits or aborts before writing can be done.

As can be seen from the above three rules,2PL provides serialization of access, it can prevent any concurrency problems, but the resulting problems are obvious, the database concurrency is greatly reduced.

Shared locks and Exclusive locks

The logic of a two-phase lock is achieved by sharing a lock with an exclusive lock:
If transaction A is to read data, you must first obtain a shared lock. The database allows multiple transactions to have a shared lock at the same time, but if another transaction has an exclusive lock, the other transaction must wait for the shared lock.

If transaction A is to write data, you must first obtain an exclusive lock. No other transaction can have a lock at the same time ( either shared or exclusive ) so if there are any locks on the object, transaction a must wait.

If transaction a reads the data first, then writes the data. It can upgrade a shared lock to an exclusive lock. The upgrade is the same as obtaining an exclusive lock directly.

After a transaction obtains a lock, it must continue to hold the lock until the transaction ends (commit or abort). This is the name of the "two-stage": The first stage when the lock is acquired, the second stage releases the lock.

Because so many locks are used, it is easy to happen that transaction A is stuck waiting for transaction B to release its lock, and vice versa. This condition is called a deadlock . The database automatically detects the deadlock and terminates the transaction, and then restarts the transaction queue.

Serialized Snapshot Isolation (SSI)

Two-stage lock (2PL) because of pessimistic concurrency control, it is not only easy to cause deadlock, and performance is low. So next we'll look at the serialized Snapshot Isolation (SSI), which provides a complete serialization, but only a small performance penalty compared to a two-phase lock.

When we discussed the concurrency write problem in snapshot isolation before, it was because the transaction read some data from the database, checked the read result, and decided to take some action based on the results it saw. However, in the case of snapshot isolation, the results of the original query may no longer be up-to-date when the transaction is committed because the data may have been modified during this time. So there may be a causal dependency between the query and the write in the transaction. in order to provide serialization isolation, the database can detect this situation and terminate the illegal transaction.

Detect if old data is being read

Snapshot isolation is typically implemented with multiple versions of concurrency control, which ignores new writes when a transaction reads a consistent snapshot of a database. To prevent this exception, the database needs to track whether a write to another transaction is ignored while the transaction is being read, and the database checks for any ignored writes when the transaction is to be committed. If a write operation is omitted, the transaction must be aborted.

Why wait until the commit, instead of detecting the old data, to terminate the transaction as soon as it is read? Then, if the transaction is a read-only transaction, it does not need to be aborted, and when the transaction is read, the database is unaware that the transaction will later perform a write operation. The example of a leave of absence from Alice and Bob above can be done in such a way as to avoid concurrent write problems:

Detects writes that affect a previously read

If you do not detect that the old data is being read, there is still the possibility of a concurrent write problem.

So when a transaction is written to the database, it records the index of any other transaction that reads the affected data. once the first transaction is committed successfully, all other related index transactions must be terminated. in this way, the security of concurrent writes is ensured by such snapshot isolation. Also in the example above, the indexing termination technique is temporarily:

Many engineering details affect the effectiveness of the algorithm in practice. Tracks the granularity of Read and write transactions. If the database tracks the activity of each transaction in great detail, it can accurately determine which transactions need to be aborted, but these costs can become significant. Less verbose tracking of transactions can be faster, but may result in more transactions being aborted. A serializable isolation snapshot is advantageous compared to a two-phase lock: One transaction does not need to block locks that wait for another transaction to hold.

Summary:

In this article, we summarize the various strategies and technologies used in database transaction and isolation, and hope that we can better understand the importance of transaction in database system, and can apply the most appropriate isolation level for your development environment.

Transaction and Isolation levels------"Designing data-intensive Applications" Reading notes 10

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.