Database and database Learning
Serializability of concurrent Scheduling
DBMS may produce different results for different scheduling (schedule) of concurrent transactions.
What kind of scheduling is correct?
Serial scheduling is correct.
For serial scheduling, there is no crossover between the operations of each transaction, so there is no mutual interference, and of course there will be no concurrency. As mentioned above, transactions are used to change a database from a consistent state to another consistent state. After multiple transactions are executed in sequence, the database remains consistent.
Serializable Scheduling
The concurrent execution of multiple transactions is correct, and the results are the same only when the results are serialized in a certain order. Serializable scheduling also ensures Database Consistency.
[Example] There are two transactions that contain the following operations: Transaction T1: Read B; A = B + 1; write back transaction T2: Read A; B = A + 1; write back B. Different scheduling policies are given for the two transactions.
Serializability)
Is the rule for correct scheduling of concurrent transactions. In RDBMS, it serves as the correctness criterion of concurrency control. A given concurrency scheduling is considered correct only when it is serializable.
Full conditions for serializable Scheduling
When a scheduling SC ensures that the order of conflicting operations remains unchanged, it obtains another scheduling SC by exchanging the order of non-conflicting operations of two transactions. If SC is serial, scheduling SC is a conflicting and serializable scheduling.
One scheduling is conflict serializable and must be serializable.
Generally, RDBMS converts conflicting serializability into the correctness criterion of concurrency control.
Conflict Operation)
Conflicting operations refer to the read/write operations performed by different transactions on the same data.
Ri (x) and Wj (x)/* transaction Ti read x, Tj write x */
Wi (x) and Wj (x)/* transaction Ti writes x, Tj writes x */
Other operations do not conflict with each other.
Conflicting operations of different transactions and two operations of the same transaction cannot be exchanged (Commute). Otherwise, the execution effect will be affected.
[Example] Sc1 = r1 (A) w1 (A) r2 (A) w2 (A) r1 (B) w1 (B) r2 (B) w2 (B) w2 (B) exchange w2 (A) with r1 (B) w1 (B) to obtain: r1 (A) w1 (A) r2 (A) r1 (B) w1 (B) w2 (A) r2 (B) w2 (B) then exchange r2 (A) with r1 (B) w1 (B): Sc2 = r1 (A) w1 () r1 (B) w1 (B) r2 (A) w2 (A) r2 (B) w2 (B) Sc2 is equivalent to A serial scheduling T1, T2, sc1 conflict serializable Scheduling
Conflict serializable scheduling is a sufficient and not necessary condition for serializable scheduling. There are also serializable scheduling that does not meet the conflicting serializability conditions, called view serializability scheduling. [Example] There are three transactions. L1 and L2 are equivalent to the objective (view equivalence) T1 = W1 (Y) W1 (X), T2 = W2 (Y) w2 (X), T3 = W3 (X) scheduling L1 = W1 (Y) W1 (X) W2 (Y) W2 (X) W3 (X) is a serial scheduling. Scheduling L2 = W1 (Y) W2 (Y) W2 (X) W1 (X) W3 (X) is not serializable due to conflict. However, the scheduling L2 is serializable, because the L2 execution result is the same as the scheduling L1, the Y value is equal to the T2 value, and the X value is equal to the T3 value.
Blocking Protocol
When using the blocking method, some rules need to be agreed when locking data objects.
When to apply for blocking
Lock holding time
When to release the block, etc.
Two-Phase Locking (2PL) is the most common blocking protocol. Theoretically, it is proved that the Two blocking protocols generate serializable scheduling.
Two-segment lock protocol
It means that all transactions must lock and unlock data items in two phases.
Before performing read and write operations on any data, the transaction must first obtain the block on the data.
After a lock is released, the transaction no longer applies for and receives any other blocking
Meaning of "Two-segment" Lock
Transactions are divided into two phases
The first stage is to obtain the block, also known as the expansion stage.
A transaction can apply for any type of lock on any data item, but cannot release any lock.
The second stage is to release the block, also known as the contraction stage.
A transaction can release any type of lock on any data item, but no more locks can be applied.
For example, transaction Ti complies with the two lock protocols, and its lock sequence is: Slock A Slock B Xlock C Unlock B Unlock A Unlock C; | merge extension phase → | merge contraction phase → | transaction Tj does not comply with the two-segment lock protocol, and its lock sequence is: Slock A Unlock A Slock B Xlock C Unlock B;
Transaction compliance with the two-segment lock protocol is a sufficient condition for serializable scheduling, rather than a necessary condition.
If concurrent transactions comply with the two-segment lock protocol, any concurrent scheduling policies for these transactions can be serialized.
If one scheduling of concurrent transactions is serializable, not all transactions must comply with the two-segment lock protocol.
Two lock protocols and one lock method to prevent deadlocks
A blocking method requires that each transaction lock all the data to be used at a time; otherwise, the transaction cannot continue. Therefore, the blocking method complies with the two lock protocols.
However, the two-segment lock protocol does not require that the transaction lock all the data to be used at a time. Therefore, transactions that comply with the two-segment lock protocol may experience deadlocks.
Granularity)
The size of the blocked object is called the blocking Granularity)
Blocked object: logical unit and physical unit
For example, in a relational database, block objects:
Logical unit: attribute value, attribute value set, tuples, relationships, index items, entire index, entire database, etc.
Physical unit: pages (data pages or index pages), physical records, etc.
The blocking granularity is closely related to the concurrency of the system and the overhead of concurrency control.
The larger the blocking granularity, the fewer data units the database can block, the smaller the concurrency, and the smaller the system overhead;
The smaller the blocking granularity, the higher the concurrency, but the larger the system overhead
Example
If the blocking granularity is data page, transaction T1 needs to modify the tuple L1, then T1 must lock the entire data page athat contains L1. If transaction T2 needs to modify the L2 group in a after T1 locks the, the T2 is forced to wait until T1 releases.
If the lock granularity is tuples, T1 and T2 can lock L1 and L2 at the same time without waiting for each other, improving the degree of parallelism of the system.
Another example is that transaction T needs to read the entire table. If the blocking granularity is tuples, T must lock every tuples In the table, causing a high overhead.
Multiple Granularity Locking)
Multiple blocking granularities are supported for different transactions in one system.
Select blocking Granularity
Consider both blocking overhead and concurrency, and select the blocking granularity as appropriate.
User transactions that need to process a large number of tuples of multiple relationships: in the unit of database Blocking
User transactions that need to process a large number of tuples: Link-based Blocking Unit
Only user transactions with a few tuples are processed: the unit of tuples is blocked.
Multi-granularity tree
Multi-level blocking granularity in a tree structure
The root node is the entire database, indicating the maximum data granularity
Leaf node indicates the minimum data granularity
Each node in the Multi-granularity tree can be independently locked
Locking a node means that all descendant nodes of this node are also locked of the same type.
In multi-granularity blocking, a data object can be blocked in two ways: explicit blocking and implicit blocking.
Explicit blocking: Independent blocking directly added to Data Objects
Implicit lock: The data object is not independently locked because its upper-level node locks the data object.
The effects of explicit and implicit blocking are the same!
When the system checks for blocking conflicts
Check for explicit Blocking
Check for implicit Blocking
For example, transaction T needs to apply an X lock to relational R1.
The system must search for its parent node database, relational R1
You also need to search for the lower-level node of R1, that is, every tuples in R1.
If one of the Data Objects has an incompatible lock, T must wait.
Lock a data object. The system must check
This data object
Whether explicit blocking conflicts with it
All parent nodes (ancestor)
Check whether the explicit blocking of this transaction conflicts with the implicit blocking of other transactions on the Data Object: (caused by the blocking of the upper-level node)
All subordinate nodes (descendant)
Check whether the explicit blocking of other transactions on the lower-level nodes conflicts with the implicit blocking of this transaction (the blocking will be added to the lower-level nodes ).
Data sharing and data consistency are a conflict
The value of a database depends largely on the degree of data sharing it provides.
Data sharing depends largely on the degree to which the system allows concurrent data operations.
The degree of data concurrency depends on the concurrency control mechanism in the database.
Data Consistency also depends on the degree of concurrency control. The more concurrency control, the better the Data Consistency
The concurrency control of the database is measured in transactions.
Database concurrency control usually uses a blocking Mechanism
The two most common blocking methods
The criterion for determining whether concurrent transaction operations are correctly scheduled by the concurrency control mechanism is serializable
The correctness of concurrent operations is usually guaranteed by two lock protocols.
The two-segment lock protocol is a sufficient condition for serializable scheduling, but it is not necessary.