How relational databases work-transaction management (ii) (translated from Coding-geek articles)

Source: Internet
Author: User
Tags postgresql

This article is translated from Coding-geek article: How does a relational the database work.

Original link: http://coding-geek.com/how-databases-work/#Buffer-replacement_strategies

Immediately after the previous article, this article translated the following chapters:

One, Log Manager (journal manager)

From the previous chapters, we already know that in order to improve performance, the database caches the data in memory. However, if the database server crashes during the transaction submission process. The data that is cached in memory is lost, which destroys the durability characteristics of the database.
You can also write all the data directly to the storage disk, in case the server crashes during writing, only a portion of the data is written to the disk. This also destroys the atomicity nature of the transaction.

The integrity of a transaction requires that either all operations are performed or nothing is done.

There are two ways to achieve this:

    • Shadow copies/pages (Image Copy/image page): Each transaction copies a copy of its own database (or part of a database) and operates on this copy. If something goes wrong, delete the copy. After success, use the file system function to do the file exchange, replace the old data.

    • Transaction log (transaction log): Transaction log is a storage area – writes the information to the Transaction log file before the transaction writes the data to the disk. This way, if the service crashes, the transaction is canceled, the database is clear if the data is deleted based on the log, or the unfinished operation continues to be completed.

Ii. WAL (write Ahead logging pre-write log system)

There are many transactions on large databases, and Shadow Copies/pages consumes a lot of disk space. This is why the modern database uses Transaction log. The Transaction log must be stored in a reliable location. I'm not going to go into the storage technology, I'm assuming that the raid disk used is reliable.

Most databases (Oracle, SQL Server, DB2, PostgreSQL, MySQL, and SQLite) handle transaction log as the Wal protocol. This agreement contains three rules:

    • Each time a database is modified, a log record is generated. and log information must be written to the transaction log file before the data is written to disk.
    • Transaction logs must be stored sequentially. Operation A is performed before Operation B, so the log for operation a must be written first.
    • When a transaction commits, the commit command must be written to the transaction log file before the transaction ends.

The above rules are guaranteed by log Manager. A simple understanding: adding a layer of Log Manager,log Manager to the cache manager and data Acccess Manager will each action (including Update/delete/create/commit/rollback Write the transaction Log first, and then write the data to disk. Is it simple enough?

Wrong! , everything we've seen, database-related stuff has been "cursed" – Database performance (db effect). The more troubling question is how to find an efficient way to write transaction logs. If the write transaction log is slow, it will drag all the database operations to the hind legs.

Iii. ARIES (prototype algorithm for IBM Data Recovery)

In 1992, the IBM Research Center "invented" the enhanced version of Wal called Aries. Modern databases are more or less using the Aries algorithm. The implementation logic may be different, but the underlying principle is Aries. I quoted the "description" because it didn't develop a better data recovery software based on the MIT mindset, IBMs Research Center. Stay in the theory.

"When I was 5 years old, the Aries algorithm was released," I don't care about the gossip of a handful of hard-pressed developers. I actually took this as a half-time break before I started the last technical chapter. I have read a lot of documents about Aries and I find it interesting. Here, I'll just tell you about the Aries. If you want to know more about Aries, I suggest reading the relevant papers.

Areis means "algorithms for Recovery and isolation exploiting semantics".

Aries aims to two aspects:

    • Improves the performance of write transaction log.
    • Fast and reliable recovery capability.

There are several reasons why a database transaction can be rolled back:

    • The user canceled.
    • Server crashes or network outages.
    • Transactional operations violate the constraints of a database (for example, a field requires data to be unique, and a transaction inserts duplicate data).
    • A deadlock has occurred.

Sometimes (such as a network failure), the database can recover transactions. How does it do that? To answer this question, we need to understand what information is recorded in the log.

Iv. the Logs

Any operation in the transaction will generate logs, including additions, deletions, and modifications. A log is composed of the following information:
1. Lsn:log Sequence number, unique numbering of the log records. The LSN is generated in chronological order. This means that if action A is executed before action B, the LSN encoding of a will be less than B.
2. TRANSID: Transaction ID. Records the transaction to which the current operation belongs.
3. PageID: Page ID that records where the modified data is stored on disk. Paging is the smallest unit of data stored on disk, so where the data is stored on disk refers to the location of the paging that contains the data on disk.
4. PRELSN: The previous log in the same transaction.
5. UNDO: Rollback operation.
For example, for data modification operations. Undo stores the value or state of the modification operation before the data on the disk is formally modified. Performs a reverse operation back to the previous state when rolling back.
6. REDO: Re-operation.
There are two ways to implement redo, the value and state of a stored operation, or the operation itself. Redo when the operation is performed again.
7 ..... For reference only: The Aries log also has two additional fields: Undonxtlsn and Type.

In addition, the disk paging that stores data also stores the last LSN that modifies the paging data.

Note: As I understand it, only PostgreSQL does not use Undo. It has a garbage collection thread that cleans up old versions of data. This is an application scenario for PostgreSQL's data version.

Here is a simple example, "UPDATE from person SET-age = 18;" Modify the generated log records.

Each log has a unique LSN. All the logs in the same transaction form the linked list structure, and the logs in the list are chronologically arranged in chronological order.

V. Log Buffer

To avoid writing logs as a performance bottleneck, log buffer comes in handy.

When query Excutor performs a database modification operation:

    1. The cache Manager saves the modified data to the cache.
    2. Log Manager saves the relevant logging information to the cache.
    3. In this step, the Query excutor task modification operation has been completed. You can receive the next operation request.
    4. Log Manager then writes the logging information to the transaction log. Decide when to write a log with an algorithm.
    5. The Cache manager then writes the data to disk. Deciding when to write data to disk also has an algorithm.

When a transaction commits successfully, it means that all operations in the transaction have completed the above-mentioned steps. The Write transaction log is very fast because only the transaction log information needs to be added to the end of the log file. Conversely, writing data to disk is much more complex, and when writing data, consider how to read it quickly (the translator's note: The author's point is that the database will be indexed to improve read performance.) Writing data will involve the adjustment of the storage index B + tree. Time is long and the algorithm is complex. )

Vi. Steal and force policies (stealing and forcing policies)

To improve performance, the 5th step above (writing data to disk) may be placed after a transaction is committed, because even if something happens, it can be recovered through the redo of the transaction log. This approach is called No-force Policy.

The database can also select Force Policy (the 5th step must be completed before the transaction commits) to reduce the burden of database recovery.

Another question is the way to write data to disk (steal policy) or, when a transaction commits, a one-time method of writing the cached data to disk (No-steal policy). Which method to choose depends on the scenario: fast write, but requires slow recovery with undo logs, or quick recovery.

What are the effects of different strategies on data recovery:

    • Steal/no-force requires undo and redo: the best performance, but also more complex log design and recovery processing (such as Aries). This is how most databases are used.
    • Steal/force: Only the undo capability is required.
    • No-steal/no-force: Requires only redo capability.
    • No-steal/force: Nothing is needed and the performance is the worst. Requires a lot of memory.
VII, the recovery part

OK, we've built the perfect database log to see how it should be used.

If an intern had to hang up the database. You restart the database, and then the database recovery work begins.

Aries three steps to recover the database from a crash:
1. The analysis Pass (analytical Phase): The recovery program reads all transaction logs and rebuilds the site environment when a disaster occurs. To determine which transactions need to be rolled back and what data needs to be written to disk.
2. The Redo Pass (Redo phase): This stage updates the database according to the log of the analysis, performing the Redo operation, bringing the database back to the pre-disaster state.
In the redo phase, the redo log is executed sequentially (based on the LSN) in chronological order. For each log, the recovery program reads the LSN from the paging that contains the data on the disk.
if (LSN on disk paging) >= if (LSN in transaction log): This means that the data has been written to the disk before the disaster (and that the data was overwritten by the actions performed before the disaster occurred). There is nothing to do with this operation log.
F (LSN on disk paging) < IF (LSN in transaction log): Indicates that the data on the disk has been modified. The redo operation is performed and the database is written. Even if the back of the transaction is rolled back (this is to simplify the recovery process, the modern database will not do so).
3. The Undo pass: This phase rolls back transactions that were not completed when the disaster occurred. The rollback operation starts from the last log of the transaction, and then executes the undo operation from a backward forward manner (through the prelsn of the log).

During data recovery, the transaction log is used as the basis for the recovery program to perform tasks to ensure that the data is written to the disk order and the transaction log record sequence is synchronized. There should be a scenario where you can remove logs related to incomplete operations in a transaction, but this is difficult. Instead, Aries uses the method of adding remediation logs to achieve the purpose of theoretically deleting unfinished operations in a transaction.

When a transaction is canceled, the cancellation may be due to user cancellation, lock manager cancellation (deadlock prevention), or because of a network outage. These conditions do not need to be considered for recovery through logs. In fact, information about how to redo and undo is already stored in two memory tables:

    1. Transaction Tables-transaction table (stores all the transaction states that are currently executing).

    2. Dirty page Tables-dirty page table (which piece of data to store needs to be written to disk).
      For each new transaction, the two tables will be updated by the cache manager and transaction manager. Because these two tables are stored in memory, the data is lost when the database crashes.

Using transaction log for database failure recovery, the thing to do during the analysis phase is to rebuild the two tables. To speed up the analysis phase, Aries created the concept of checkpoints (CheckPoint). The idea is to write Transaction table and dirty page table from time to disk and write the last LSN at that time to disk, so that only the logs after that LSN will need to be parsed during the analysis phase.

Eight, to conclude last word

Before writing this article, I already know that this topic is very big and it takes a lot of time to write it down. As it turns out, I'm still overly optimistic, and I spent twice times as much time as I expected. In this process also learned a lot of things.

If you want to get a deeper understanding of the database, I suggest you study the paper "Architecture a Database System". This is an introduction to the database is a good article, non-computer professionals can also understand. This paper helps me to plan the content of this article, this is an article about architecture idea, unlike this article, this article mainly describes data structure and algorithm.

If you read this article carefully, you'll see how powerful the database is. Because the article is very long, comb the key content again.

    1. B + Tree Index overview
    2. Overall overview of the database
    3. Overview of performance optimizations for data table connection operations
    4. Cache Pool Management
    5. Transaction management

Real-world database capabilities are even more powerful, for example: I haven't yet covered how the database solves these more troublesome issues:

    1. DB cluster and global transaction
    2. How data is created in the process of running a snapshot
    3. Efficient data compression
    4. Memory management

So think carefully when you need to choose before filling in the bug's no-SQL and rock-solid relational databases. Don't misunderstand me, some no SQL database is also great. However, they are too tender and focus only on solving a very small number of specific problems.

In short, when someone asks you how the relational database works, you don't have to run away, and now you can answer that question.

Alternatively, you can also drop this document directly to him.

How relational databases work-transaction management (ii) (translated from Coding-geek articles)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.