Transaction information system-concurrency control and recovery theory, algorithms and practices-computing model, concurrency control part

Source: Internet
Author: User
Transaction information system-concurrency control and recovery theory, algorithms and practices


Page Storage Structure


Figure: The storage layout page of the database page is the smallest unit for data transmission between the disk and the primary storage, and also the Unit for caching in the memory.
The page header contains information about space management in the page, such as the number of bytes in the free space and the maximum size of the free area.

Function of Slot Array:
Updating varchar and other variable-length fields may lead to records moving in the page. If the external system directly references the physical address of the data record, the processing of records moving is very complicated, therefore, add an isolation layer Slot Array in the middle. The actual address where the page data is stored in the Slot Array. The page number + Slot number form the RID, and the external data records are referenced by the RID.
When moving data in the page, you only need to update the actual address in the Slot Array. You can use a Forwarding RID to move data between pages. For example
For records that contain long fields such as BLOB, the simple technique is to store the actual data of the column on multiple other pages, and store the page number list in the actual data location of the record.

One disk IO can read one or more consecutive pages. a fixed number of pages that can be read at a time are called blocks. In the database theory, blocks and pages are treated similarly.
The database system usually pre-allocates disk space (one or more zones are pre-allocated at a time) in the unit of range (extent). The interval is multiple consecutive pages. Note the difference between interval and block.
The storage layer needs to save the necessary metadata, such as converting the page number into a physical disk address and managing the free space. The Range table (extent table) is usually used to convert the page number and the physical disk address.

Computational Models
Page model
The read/write model is a simple model. For example, transaction t uses the page model definition as follows: t = r (x) w (x) r (y) r (u) w (y), where r indicates reading the data page, w indicates writing data pages. For the storage layer, a transaction is such a read/write sequence or partial order.
The page model uses a concise and elegant way to grasp the essence of concurrency control and recovery. It can describe an important problem in many system implementations. The limitation is that only the read and write operations on lower-level pages are an element, no semantics for expressing Data Access Operations

Object model
On the basis of the page model, the object model considers high-level operations. In the object model, transactions are composed of ADT (abstract data type) operations, which are still attributed to the reading and writing of lower-layer pages.
The following is an example of an object model:

Figure: Example of an object model scenario:

* In transaction t1, run a select statement first to find the person living in the city of Austin, and then execute an insert statement to insert information about the person living in the city of Austin.
* Assume that there is only one index in the table, which is located in the city field and uses the B + tree structure. The index structure has only two layers.

The ADT operation and page read and write are described as follows:

1. Search ('austin '): The select statement uses the city index to find the index items that match the Austin, and obtains the RID list.
* R (r): reads the root node page of B + tree indexes.
* R (l): Read the leaf node page of B + tree indexes.
2. Fetch (x), Fetch (y): load the data pages of data x and y based on the RID list to obtain data records x and y.
* R (p), r (q): Read the data pages p and q of the RID x and y respectively.
3. Store (z): Insert record z
* R (f): reads the metadata page f of the storage layer, which records idle space information.
* R (p): Obtain page p from page f with enough space to insert record z. Here, read this page p
* W (p): Write record z to page p and write page p back to disk
* R (r), r (l), w (l): updates the index of the City field, reads the index root node page r, leaf node page l, update leaf node page l and write back to disk

The object model transaction t is a tree. It uses the transaction identifier to mark the root node, the called Operation name and parameters to mark non-leaf nodes, and the page model read/write operation to mark leaf nodes.
If the transaction operation sequence is viewed in full order, the transaction can only be executed in serial mode. If it is regarded as partial order, some operations can be executed in parallel.

Typical concurrency Problems

Dirty read dirty-read, inconsistent read inconsistent-read, phantom read phantom-read reference Transaction, Lock, Isolation Level
Update loss lost-update:

Figure: Update loss

Concurrency Control on the page model-scheduling Protocol

The serializability theories of page models include final-state serializability, view serializability, and conflicting serializability, because the complexity of determining final-state serializability and view serializability is too high, commercial databases generally use conflicting serializable CSR classes for determination.
Concept:
Historical history: Refers to a complete and completed transaction, that is, the transaction is either committed using commit or terminated using abort.
Schedule: a prefix of a previous transaction.

Figure: Transaction sched figure: Transaction sched
Client 1 and client 2 represent the Operation Sequence of different things
The transaction manager TM is mainly used to register transactions, manage trans, commit, abort, and active lists, and maintain ready-to-execute lists.
The scheduler receives input scheduling from TM and converts it to serializable output scheduling.
Schedulers are divided into optimistic and pessimistic

Deadlock handling
Generally, based on the concept of waits-for graph WFG, a deadlock occurs when a loop exists in WFG.
Deadlock detection: continuous detection of continuous detection and periodic detection of periodic detection
Method 1: Allow deadlock. Select the victim from the deadlock to eliminate the WFG loop. Method 2: prevent deadlock, that is, do not allow deadlock.

Two-phase blocking protocol-two-phase locking protocol-2PL
For each thing, the locking phase is strictly different from the subsequent release phase. The blocking protocol is a two-phase
That is to say, you can clearly divide the transaction into the first and second parts. There is only a lock in the first part, but there is no lock release operation in the latter part. Commercial databases are not implemented in strict compliance with the 2PL protocol when considering performance issues. Different isolation levels of SQL92 are a trade-off between performance and serializability. In addition, the page model does not take semantic aspects into account, the commercial database can also be used to reconstruct and optimize the blocking protocol in exchange for performance.

Figure: Lock growth and reduction phases under 2PL diagram: Lock growth and reduction phases under 2PL

The 2PL protocol has several variants: C2PL, S2PL, and SS2PL.
The following is a scheduling example of a 2PL Scheduler:

Figure: Scheduling example figure: 2PL scheduler scheduling example
S at the top is input, and 3 rows at the bottom is a feasible output through the 2PL Scheduler
The following table 1, 2, and 3 indicate different things such as t1, t2, and t3. wl indicates that the write lock is applied, wu indicates that the write lock is released, rl indicates that the read lock is applied, and ru indicates that the read lock is released, c indicates the end of a transaction (commit)
In the holding sequence diagram of the lock in the middle part, the dotted line indicates the waiting time when the lock conflict occurs.

Conservative 2PL-Conservative 2PL-C2PL
Obtain the lock for all data to be accessed at the beginning of a transaction. This Protocol can only be used in a limited range of application scenarios. It does not have a deadlock problem, because the transaction either cannot start or get all the required locks.

Figure: C2PL lock Growth and Reduction Phase Diagram: C2PL lock Growth and Reduction Phase

Strict 2PL-Strict 2PL-S2PL
The transaction holds all the write locks that have been obtained until the transaction ends.

Figure: Lock growth and reduction phases in S2PL figure: Lock growth and reduction phases in S2PL (assuming the locks in transactions are write locks)

Strong 2PL-Strong 2PL-SS2PL
All locks (including read and write locks) acquired by a transaction are kept until the transaction ends. In this case, the lock growth and reduction phases are the same as those of S2PL, but not limited to write locks.

Ordered 2PL-Ordered Sharing 2PL-O2PL
In 2PL, if the lock applied for the same data item is incompatible with the existing lock on the data item, you can only wait
Ordered sharing compatibility rules: two locks on the same data item can be held by different things at the same time as long as the lock operation and the corresponding data operation are executed in the same order. This rule is based on operations on data items in a fixed order, thus easing the lock compatibility rule. Theoretically, it is feasible, but it must force the order of data access, additional data structures and runtime costs are required, which are rarely used in reality.

Altruistic lock-Altruistic Locking-AL
Is an extension protocol of 2PL. Some things may take a long time and hold a large number of locks, which may block a large number of other short things. If a short transaction only accesses a subset of a long transaction's processed data items, the long transaction will donate the data items to other transactions.

Non-two-phase blocking Protocol
Write-only tree locking WTL, read-write block tree read/write tree locking RWTL, etc. They can only be used when the access to data items follows a specific order, for example, the access to data items is tree-like.
WTL adds two additional rules on the basis of the blocking rule: only when the parent data write lock is held can the write lock of the sub-item data be obtained; after the write lock of a transaction to release a data item, you cannot obtain the write lock for this data item. This ensures that the data update process will be performed from the root node to the leaf node along a certain path of the tree, and the lock range can be limited to a specific subtree. The lock application and release are not two phases. When the number path is directed to the subtree, the parent lock can be released, that is, the lock application and release can be performed alternately.
The WTL protocol does not have a deadlock. If you modify the RWTL protocol rules, you can export the DAG blocking protocol DAG locking protocol.

Timestamp sorting protocol-timestamp ordering protocol-
No blocking is used. The basic rule is to schedule conflicting operations on different things according to the order of the start time of the things.
In the block protocol, the lock mechanism is used TO detect conflicts. In the TO protocol, additional information is required TO detect conflicts. For example, when Ti started before Tj, the scheduler first output a scheduling qj (x) to Tj, but then Ti sent a pi (x) operation, which corresponds to qj (x) conflict (that is, according TO the TO rule, pi (x) must be executed before qj (x), but qj (x) has been output by scheduling before pi (x ), therefore, the scheduler needs to record information to detect such conflicts and handle them accordingly (for example, blocking Ti and starting after Tj ends)

SGT protocol-Serialization Graph Tester-serializable Graph Detector
Maintain an SGT graph. Each concurrent transaction in the graph has a node. Operations of transaction Ti conflict with Tj. If conflicting operations in Tj have been output, then, Tj adds a directed edge to Ti, and Ti enters the waiting state. If an operation of things Ti causes a loop in the SGT graph, the output is not serializable, and things Ti needs to cancel
In reality, the SGT protocol requires a large amount of additional data to be maintained and the amount of computing is relatively large, which cannot be received by the scheduler, so it is rarely used

Concurrency Control on the object model and concurrency control on the search structure

......

Multi-version concurrency control-multiversion concurrency control

In a single version, each data item in the database has only one copy, while in multiple versions, multiple versions are maintained for the data item at the same time, while multiple versions are transparent to the client, for the client, the database maintains only one copy of the data item. Through the extension and adjustment of the concurrency control algorithm, multiple versions can achieve the serializability goal. It is just another Processing Method of concurrency control. Its advantage is that it can reduce the use of locks in the blocking protocol, greatly improve the concurrent processing performance, and bring benefits to data recovery and other aspects.

Multi-version timestamp sorting protocol-multiversion timestamp ordering protocol-MVTO
Essentially, it is similar to the FIFO method for concurrent operations. Each transaction is sorted by the timestamp at the beginning of the transaction. Scheduling and conflict are handled in this order.
In the following rule description, Ti indicates the I transaction; ts (Ti) indicates the time stamp of the I transaction; Ri (X) indicates that Ti reads the data item X; xk indicates a version of the data item X, which is written by the transaction Tk; ts (Xk) indicates the timestamp of the Xk version of the data item, which is the timestamp of the transaction Tk, that is, ts (Xk) = ts (Tk ). The following are MVTO rules:

1. ri (X) will be converted to Ri (Xk), that is, you need to specify which version of data item X to read, where Xk should be ts (Xk) <ts (Ti). In addition, Xk may be a submitted version or An unsubmitted version.
2. when processing Wi (X), if an Rj (Xk) operation already exists and the timestamp relationship Is ts (Xk) <ts (Ti) <ts (Tj ), if Wi (X) is rejected, Ti will be canceled because of this conflict, resulting in non-serializable results. Otherwise, Wi (X) is converted to Wi (Xi) and executed. That is, a new version Xi is generated for data item X.
3. Optional rule: delayed submission. This is a mechanism that prevents dirty reads when dirty reads are not allowed. If transaction Ti reads a data item written by Tj, that is, there is an Ri (Xj) operation, the commit Ci of transaction Ti needs to be postponed until the transaction Tj is successfully submitted and then executed. If the transaction Tj fails, the transaction Ti also fails (or redo)

The following is an execution example under the MVTO protocol. The above rules can be used for explanation:

Figure: Execution example under MVTO Protocol

* The r3 (x2) operation exists in thing t3, that is, the uncommitted version x2 is read (according to rule 1, it should read x2 ), therefore, his submission is delayed (the dotted line indicates waiting) and will be submitted after t2.
* T4 is terminated because Rule 2 is used, because r5 (y2) exists in t5. If w4 (y4) is successful, transaction t5 may encounter inconsistent reads, it is not a serializable scheduling. The figure is from the original book. There should be an error in the figure. r5 (y2) cannot appear before w2 (y2). It should be located between w2 (y2) and w4 (y4 ).
* Finally, let's take a look at t1. When reading y, the y0 version should be read according to Rule 1 of MVTO. In this way, the T1, however, he does not use the lock to block t2 and t4 (when the two-phase blocking protocol is used, their write operations conflict with the reading lock of t1), but rather let t2 and t4 execute in parallel. According to the MVTO rule, t1 cannot write x or y. Otherwise, T1. In this way, theoretically, the storage layer can guarantee the serializable results of x and y, but there is a question:If t1 has other write operations, such as w1 (u1), and u1 is calculated based on x and y, that is, u1 = f0 (u) f1 (x) f2 (y ), how can we ensure the serializable results of data item u? For the external, t1 implements consistent read on x and y, but w2 (y2) succeeds. The external computation result based on x and y may be an incorrect value, how to ensure consistency?

There are also multi-version two-phase blocking protocols-multiversion 2PL locking protocol-MV2PL, read-only multi-version protocol-read-only multiversion protocol-ROMV, etc.

Transient version-Transient Versioning
The latest version of the data item is stored in the data record location. earlier versions of the data item are stored in the version pool. The version pool can reside in the memory or disk. In this way, when accessing the latest version of data, there is no performance loss, and the version pool facilitates garbage collection of the old version. In some systems, the version pool is combined with the log items used for restoration. The version pool is also called the rollback segment.

Figure: Storage Organization of transient version
Each version of a data item includes two additional fields: one is the creation timestamp, and the other is the pointer to the previous version.

* The new data item does not have an earlier version. Therefore, the pointer in the current version is null. In this example, the pointer is 1135.
* The deleted data is still stored in the current version, but is marked with the delete mark. In this example
* Each version of the same data item forms a linked list.

Another method is to maintain a small version selection table in the current version of the data item, which contains the timestamp of each version and a pointer pointing to the location of the version.
If the old version of a data item is no longer used by the active transaction, the old version will become junk and can be recycled.

Concurrent Control of relational databases
Predicate-Oriented Concurrency Control
The blocking protocol locks pages and data items, while the concurrent control for predicates locks operations, and more specifically the predicates for operations. For example:
Select name from persons where age> 30
Associate age> 30 with the lock
Predicate-oriented concurrency control is an idea proposed earlier. Later, it has been studied many times. Its problem is that it is an NP-complete problem to check whether the two predicates are compatible, the cost is too high, but in a specific scenario, such as the concurrency control on the B + tree, it has practical significance.

Implementation and Practicality

General Lock Manager Data Structure

Figure: Data Structure of the lock manager
Objectives to be achieved by the lock manager:

1. conflict can be checked during request locks
2. When releasing a lock, you need to grant the resource to other locks blocked by the lock.
3. When the transaction ends, all locks of the transaction will be released.

The data structure of the lock manager is resident in the memory. The hash table indexed by the resource ID stores the locked and the resource ID to be locked.
Each hash table item points to a resource control block RCB. The hash chain in RCB is used to resolve hash conflicts.
For shared locks, multiple locks can be held by the same resource at the same time. Multiple lock requests (such as conflicting write locks) may be waiting to be granted to the resource, in this way, a list of existing shared locks will be formed on the same resource and a queue of exclusive locks and shared locks waiting to be granted. Therefore, RCB will point to a chain table of the lock control block LCB, that is, the FirstInQueue and NextInQueue linked lists in
In order to reach 3rd points, a transaction control block TCB is maintained for each active transaction. Similarly, a linked list is used to link all the locks of the transaction, that is, the linked list consisting of the LCB chain.

The data structure of the lock is one of the most frequently accessed areas in the database. Therefore, you need to control the size of the data structure in the memory to avoid performance problems, therefore, there are usually lock granularity upgrade methods in the database. If a transaction already holds a large number of fine-grained row locks, the database may upgrade the lock granularity to page locks or table locks to reduce the memory space of the lock manager.
When locks of different granularities are used, the compatibility between different granularities is determined. Therefore, intention locks are introduced. For details, refer to database system-application-oriented method.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.