Go to an article about the SQL Server three recovery model, written from the SQL Server mechanism, feels good, turns

Source: Internet
Author: User
Tags sql server management truncated

Introduction

The transaction log in SQL Server is undoubtedly one of the most important parts of SQL Server. Because SQL Server uses transaction logs to ensure persistence (durability) and transaction rollback (Rollback). This also partially ensures the acid properties of the transaction. When SQL Server crashes, DBAs can also recover data to a specified point in time through the transaction log. When SQL Server is working well, it is not so important to know the principles and concepts of some transaction logs. However, once SQL Server crashes, understanding the principles and concepts of the transaction log is especially important to quickly make the right decisions to recover the data. This series of articles will discuss SQL in terms of the concept of transaction logs, how SQL Server uses logs to ensure persistence properties, and so on. The transaction log for the server.

the physical organizational structure of the transaction log

The transaction log is simply a log file that records the transaction behavior on its corresponding database and modifies the database. When you create a new database, there is a default transaction log file with an LDF extension. Of course, a database can also be equipped with multiple log files, but logically they can be viewed as a .

In SQL Server management of log files, it is logical that an LDF file is divided into logical virtual log files (Vsan log file, abbreviated VLFS). for easy management. Using an analogy method, the log file (LDF) is like a train, and each compartment is a virtual log file (VLFs):

So why does SQL Server divide the log files into multiple VLFs? Because SQL Server makes the storage engine manage the transaction log in this way, it is more efficient to reuse the log space. Using VLF as the smallest unit of shrinking a database is undoubtedly more efficient than using an LDF file as the smallest unit.

The number and size of VLFs cannot be set by configuration, but is managed by SQL Server. When you create or alter a database, SQL Server determines the size and number of VLFs by the size of the LDF file. SQL server also re-plans the number of VLFs when the log file grows.

Note: According to this principle is not difficult to read, if the set log file increment is too small, it will produce too much vlfs, that is, log file fragmentation, excessive log file fragmentation will slow down SQL Server performance.

when SQL Server creates a database, the number of VLF is generated according to the size of the log file (LDF) as follows :

Size of the LDF file

Number of VLF

1M to 64M

4

64M to 1GB

8

Greater than 1GB

16

Let's look at an example:

Create a database that specifies a log size of 65M

With DBCC, we can see that there are 8 vlfs that correspond to each other:

Create the database again, specifying a log initial size of 28M:

As you can see, the corresponding, VLF number becomes 4:

With the growth of the log file, SQL Server uses the same formula that was used to create the database, that is, each time it grows for example 2M, it grows by 4 VLFs per formula.

We create a Testgrow database that specifies a log file of 2M, at which time there are 4 VLFs:

When we grow 2M, this 2M is then allocated 4 VLFs by formula:

At this point, you can see the number of VLFs should be 4+4=8:

As you can see, specifying the appropriate log file initial size and growth is the most critical part of reducing log fragmentation.

logical organizational framework for transaction logs

When any modifications made to the database object are saved to the database, the corresponding log is first logged to the log file. This record is logged to the logical end of the log file in chronological order and assigns a globally unique log sequence number (sequence number, referred to as LSN), which is exactly in order, if two serial numbers are LSN2>LSN1 in the log, The LSN2 is the LSN1 that occurs after the location.

As you can see, the log file is divided into multiple files in addition to disk space considerations. There is no parallel access like data, so the log files are divided into multiple, completely no performance improvements.

The LSN number can be thought of as a link between the log file and its record data. Each log has not only the LSN number, but also the transaction log of its corresponding transaction:

A simple picture example is as follows:

Many types of operations are logged in the transaction log. These actions include:

    • The start and end of each transaction.

    • Each data modification (INSERT, update, or delete). This includes changes made by system stored procedures or data definition language (DDL) statements to any table, including system tables.

    • Each allocation or deallocation area and page.

    • Creates or deletes a table or index.

For how LSN works in rollback or roll forward and during backup recovery, it is mentioned in subsequent articles

Summary

This article provides a brief description of the transaction log composition from the logical and physical architecture of the transaction log. This is the basis for understanding how SQL Server uses logs to ensure persistence and data backup recovery. The next article will describe how SQL Server will use the log file in an operation

Introduction

Each SQL Server database records the corresponding log to the log file in the order in which it modifies data (insert,update,delete). SQL Server uses Write-ahead logging technology to ensure the atomicity and durability of transaction logs. This technology not only guarantees atomicity (A) and persistence (D) in acid, but also greatly reduces IO operations, The work of submitting changes to the data to disk is given to Lazy-writer and checkpoint. This article focuses on the process of SQL Server modifying data and related technologies.

Pre-write log (Write-ahead Logging (WAL))


SQL Server uses Wal to ensure the atomicity and durability of transactions. In fact, not only SQL Server, basically the mainstream relational database including ORACLE,MYSQL,DB2 all use the Wal technology.

Wal's core idea is to write to the log before the data is written to the database.

Because every modification to the data is recorded in the log, it does not make much sense to write changes to the data in real time to the disk, even when SQL Server crashes unexpectedly, the data that is written to the disk during recovery (recovery) is rolled back ( RollBack), and data that should be written to the disk but not written will be re-made (Redo). This ensures durability (durability)

But Wal is not just a guarantee of atomicity and durability. Performance is also improved.

The hard disk is rotated to read the data through the Wal technology, each commit to modify the data transaction is not immediately reflected in the database, but first logged to the log. Submitted in a subsequent checkpoint and lazy writer, if no Wal technology is required to write to the database each time the data is submitted :

Using the Wal merge write will significantly reduce disk IO:

You may have questions about how the log file will be written to the modified data every time. Disk IO is also consumed. As mentioned in the previous article, each record written to the log is written in chronological order with the LSN of the given sequential number, and the log is written only to the logical end of the log file. Rather than the data, it may be written to various parts of the disk. Therefore, the cost of writing to the log is much smaller than the cost of writing the data.

steps for SQL Server to modify data

SQL Server changes to the data are performed in the following sequence of steps:

1. Write a "Begin Tran" record in the log of the SQL Server's buffer

2. Write the information to be modified on the log page of the SQL Server's buffer

3. Write data to the data page in the buffer of SQL Server that will be modified

4. Write a "Commit" record in the log of the SQL Server's buffer

5. Write the log of the buffer to the log file

6. Send acknowledgement (ACK) to client (SMSS,ODBC, etc.)

As you can see, the transaction log is not written to disk in a single step. Instead, write the log to disk one time after the buffer is written. This reduces the IO and the sequence of log LSN, which can be written to disk in the log.

The above steps show that even though the transaction has reached the commit stage, it simply writes the log page of the buffer to the log, and does not write the data to the database. When is the data page that will be modified written to the database?

Lazy writer and checkpoint

As mentioned above, SQL Server's steps to modify data do not contain the actual process of writing data to disk. In fact, writing a page in a buffer to disk is an implementation through two procedures:

The two processes are:

1.CheckPoint

2.Lazy Writer

Any pages that are modified in the buffer are marked as dirty pages. Writing this dirty page to the data disk is the work of checkpoint or lazy writer.

When a transaction encounters a commit, it simply writes all the log pages of the buffer to the log file on disk:

The data page of the buffer is actually written to the disk file until lazy writer or checkpoint:

As mentioned earlier, the LSN number in the log file can be compared, and if LSN2>LSN1, it indicates that the LSN2 occurred later than the time of the LSN1. Checkpoint or lazy writer compares the LSN number at the end of the log file to the LSN of the data file in the buffer, and only data in the buffer that is less than the LSN number at the end of the log file is written to the database on disk. This ensures that the Wal (writes the log before the data is written to the database).

the difference between Lazy writer and checkpoint

Lazy writers and checkpoint are often easily confused. Because both lazy writer and checkpoint write the "dirty" page in the buffer to the disk file. But it's only the same thing they do.

The purpose of Lazy Writer's existence is to manage buffers. When a buffer reaches a critical value, Lazy writer stores the dirty pages in the buffer in a disk file, freeing the unmodified pages and reclaiming the resources.

The meaning of checkpoint's existence is to reduce the server's recovery time (Recovery). Checkpoint, as his name indicates, is an archive point. Checkpoint will occur on a regular basis. To write a "dirty" page in the buffer to disk. But unlike lazy writer,checkpoint, there is no interest in memory management for SQL Server. So checkpoint means that all the changes that have been made before this point have been saved to the disk. Note that the checkpoint will write the dirty pages of all buffers to disk, regardless of whether the data in the dirty pages has been commit. This means that "dirty pages" that may have been written to the disk will be rolled back later (RollBack). But don't worry, if the data is rolled back, SQL Server modifies the page in the buffer and writes to the disk.

Through the operation mechanism of checkpoint, it can be seen that checkpoint intermittent (Recovery Interval) length may affect the performance. This checkpoint interval is a server-level parameter. Can be configured via Sp_config or in SSMS:

The default parameter for the recovery interval is 0, which means that SQL Server manages the reply intervals. Setting the recovery interval yourself also needs to be defined according to the specific circumstances. Shorter recovery intervals mean shorter recovery times and more disk IO, while longer recovery intervals result in less disk IO consumption and longer recovery times.

In addition to automatic checkpoint, checkpoint also occurs when the ALTER DATABASE and the SQL Server server are shut down. Members of the sysadmin and db_backupoperator groups as well as db_owner can also use the checkpoint directive to manually save checkpoint:

By specifying the parameters after checkpoint, SQL Server will complete the checkpoint process at this time, and SQL Server will use more resources to prioritize the checkpoint process if the time specified is short.

Typically, the "dirty" page is written to disk, and Lazy writer has a lot more to do than checkpoint.

Summary

This article briefly describes the concept of Wal and the role that logs play when modifying database objects. Checkpoint and Lazy Writer are also described, and the understanding of these concepts is the basis for understanding SQL Server DBA work. The following article will describe the mechanism of logging in the simple recovery model

Brief introduction

Under the simple recovery model, the role of log files is simply to guarantee the ACID properties of SQL Server transactions. Does not assume the role of specific recovery data. As the word "simple" literally means, backup and recovery of data is only dependent on manual backup and recovery. Before you begin the article, you should first understand the different types of backup that SQL Server provides.

Several types of backup provided by SQL Server

Several of the backup types provided by SQL Server can be divided into the following three categories (file and filegroup backups and partial backups are not covered in this article):

1. Full backup: Replicate all areas (Extent) of the backed up data directly. Here are 2 points worth noting:

    • A full backup does not back up all parts as its name "full", but instead backs up only the database itself, not the log (although only a small number of logs are backed up for synchronization)
    • Full backup the database is available during the backup. A full backup records the LSN number at the start of the backup, and the LSN number at the end of the backup, so that during the end of the backup the changes are applied to the backup, so that the time of the data after the full backup is the end of the backup.

2. Differential (differential) Backup: Only after the last full backup is backed up, make the modified part. The backup unit is a zone (Extent). means that even if only one page has been changed in a zone, it will be reflected in the differential backup. Differential backups depend on a bitmap for maintenance, a bit corresponds to a zone, and since the last full backup, the modified area is set to 1, while the corresponding zone in bitmap is backed up by a differential backup. After the next full backup, all the bits in the bitmap are reset to 0.

3. Log backup: Backs up only the records since the last full or log backup. In simple mode, log backups are meaningless (SQL Server does not allow you to back up logs under the simple Recovery Model), and the following explains why log backups do not make sense under the simple recovery model.

Easy Recovery Model (simple Recovery mode)

Under the simple recovery model, the log is only meant to guarantee the acid of SQL Server transactions. There is no function to recover the data.

For example, we have a backup plan, as follows:

We make a full backup at 10 o ' 0 o'clock every week, and make a differential backup for the Wednesday and Friday 0 o'clock respectively. Under the simple recovery model, if the Saturday database crashes. Our recovery plan is only restored based on a full backup of week 10, and then a differential backup of Friday 0 o'clock. All data will be lost during the Friday 0 o'clock to the server crash.

As the word "simple" covers, the log can be completely out of management under the simple recovery model. Backup and recovery are completely dependent on our own full and differential backups.

The recovery model is a database-level parameter that can be configured either in SSMS or through an SQL statement:

space usage of logs in simple recovery mode

As mentioned in the first article in this series, the log files are divided into multiple VLF for management, logically logging is linear, giving each record a sequential, unique LSN.

In the simple recovery model, the data that is likely to be rolled back is written to the log in order to ensure the persistence of the transaction. These logs need to be temporarily saved in the log to ensure that transactions can be rolled back smoothly under certain conditions. This involves a concept-the minimum recovery LSN (Minimum Recovery LSN (MinLSN))

MinLSN is the smallest LSN number in the log where the transaction is not yet closed, and MINLSN is the minimum of one of the following three:

    • Start LSN of checkpoint

    • Minimum LSN of a transaction that is not yet closed in the log

    • LSN of the origin of the oldest replication transaction that has not been delivered to the distribution database.

is a fragment of a log:

(Photo excerpt from MSDN)

As can be seen, the latest LSN is 148,147 checkpoint, before this checkpoint transaction 1 is completed, and transaction 2 is not completed, so the corresponding MINLSN should be the beginning of transaction 2, that is, 142.

From MINLSN to the logical end of the log, it is called the active log.

The relationship between the activity log distribution on the physical VLF can be expressed as:

Therefore, the status of VLF is derived from the state of the LSN it contains, and can be divided into two main categories: active VLF and inactive VLF

More subdivision can divide the status of VLF into the following four categories:

    1. Active – when any LSN stored on VLF is active, the VLF is active, even if a 200M VLF contains only one LSN, such as the VLF3
    2. Recoverable (recoverable) –VLF is inactive, the VLF does not contain an active LSN but has not been truncated (truncated)
    3. Reusable (REUSABLE) –VLF is inactive, the VLF contains no active LSN, has been truncated (truncated), can be reused
    4. unused (Unused) –VLF is inactive and has not been used

Concepts such as:

The so-called truncation (truncated) simply transforms the VLF of the recoverable state into a reusable state. Under the simple recovery model, a truncation is thrown every time checkpoint . And each time the checkpoint will push the minlsn backwards. So when the transaction is over and the checkpoint point is over, its associated log will be truncated to reuse the space.

When the log reaches the end of the log file (LDF file), that is, the VLF8, it is cycled back to VLF1 to allow the space to be reused. So the log can be from VLF1 to VLF8 in physical order, but the logical order can be from VLF6 to VLF2 end:

As you can see, the log is not saved in the Simple recovery mode (when the transaction ends, the correlation is truncated). It is only used to guarantee transaction rollback and crash recovery. So the backup log is no longer a discussion, and the log can not be used to recover the database.

Summary

This article introduces the principle of the log in the simple recovery mode, and simply leads to some backup or recovery data. In fact, in addition to developing or testing the environment. There are not many scenarios for using the simple recovery model, because in real life, there are few hours of data loss in a production environment. The next article will describe the role of the log in the full recovery model

Introduction

The data in the production environment, if it can be written on the balance sheet, I think the amount of this asset will not be small. And Murphy's law, if it's possible to get bad, no matter how small the odds are, it seems to be tailored to the DBA. Under the simple recovery model described in the previous article, there is a risk of loss from the most recent backup to the current data. The full backup model reduces the risk of data loss significantly. This article focuses on the conceptual principles and the role of logging in the full backup mode.

Complete (full) recovery mode

The full recovery model gives the data maximum protection by logging any modifications to the database. Under the full recovery model, the log does not only guarantee the acid of the database transaction. You can also restore the data to any point in time within the log range.

As stated in the previous article, in the simple recovery model, logs are almost not managed. It is possible for each checkpoint to truncate the log to reclaim inactive VLF in order to reuse space. Therefore, in the simple recovery model, the space usage of the log can hardly be considered. In contrast, in the full recovery model, logs are an important part of recovering data, and the management of logs and the management of log space usage require attention.

Under the full recovery model, Checkpoint does not truncate the log. Only a backup of the log pushes the MinLSN backwards and truncates the log. So in a system with a slightly larger volume of traffic, the log will grow fast.

Therefore, the purpose of log backup is divided into the following two:

    • Reduce the size of the activity log
    • Reduce the risk of log corruption

By being excerpted from MSDN, you can see:

A full backup was made at Db_1, and two log backups (log_1 and Log_2) were made in the next two times, and the server was damaged by the disk in which the data was located shortly after the Log_2 was backed up. At this point, if the log file is intact, you can restore the database to the point in time of the disaster by backing up the tail log (Tail of log), recovering from db_1, and then resuming log_1,log_2. Theoretically, the loss of data can be 0.

The principle of recovering data from a log is redo, which is to redo the transactions recorded in the log. This overhead is much larger than recovering from a full or differential backup. Therefore, as much as possible, reduce the amount of recovery using logs. Instead, use full or differential backups to recover more data.

Bulk-Logged (bulk-logged) recovery Model

The bulk recovery model is the same in many places as the full recovery model. However, because under the full recovery model, every operation on the database is recorded in the log. Import and export operations for some large amounts of data will undoubtedly leave a large number of records in the log. In many cases, we do not need to record this information in the log.

The bulk-logged recovery model is an alternative to the full recovery model. Microsoft's recommended best practice is to temporarily switch from the full recovery model to the bulk recovery model to save logs when doing a lot of data operations, such as index creation and rebuilt,select into operations. This conversion does not break the log chain.

This article does not delve into this pattern, just a simple explanation of the concept. Assuming that I want to insert a batch of data, the following information is logged in the full recovery model and bulk-logged recovery model:

As you can see, in the log, the bulk recovery model transforms this type of operation into an atom. Therefore, under the bulk-logged recovery model, you cannot redo such operations in the bulk log (select INTO, etc.)

log Chain (Chain)

A continuous log backup is called a log chain. Indicates that the log is contiguous. This concept can be expressed as:

Assuming that the above two log backups can be simply abstracted as 2 backups, the end LSN of log backup 1 must be greater than or equal to the first LSN of log backup two (typically the first end LSN equals the first LSN of the second log backup, but because the "back up log only" option exists to back up the log, The log is not truncated, so it is possible to overlap). The log chain for the two backups is sequential.

is an example of a production environment that looks at the log chain continuum in SSMs:

As you can see, after the first full backup, the transaction log is backed up multiple times, and the start LSN of each transaction log equals the end LSN of the previous transaction log. Therefore, you can restore to any point in time during the last log backup, starting with the first full backup.

The complete log chain begins with the first full backup or from the simple recovery model to full or bulk-logged mode, ending at the current point in time.

Recovering data from the log requires that the log chain from the most recent full or differential backup to the recovered point of time is contiguous.

Recovery Order

Recovering data from a backup takes the following steps:

1. Replicate data phase: Copy data, index pages, and logs from full and differential backups to the recovered database file.

2.Redo (roll forward) phase: Applies transactions recorded in the log to data copied from the backup. Forward the data roll to a specified point in time. Once this phase is complete, the database is still in a non-available phase:

:

The top two is restore.

3.Undo (Roll back) stage: This is also the legendary recovery, rollback any uncommitted transactions. After this phase, the database is in a usable state. Any subsequent backups will not be applied to the current database.

This concept such as:

In a continuous log backup of two log chains, the transaction defined in the first transaction log backup commits in the second transaction log backup. If the recovery option is used after the first transaction log is restored. That was the undo phase. Then transaction 1 is rolled back during the undo phase:

As can be seen, the T1 in log backup 1 is rolled back, and the commit in log backup 2 is meaningless. This is why it is not allowed to resume subsequent backups after the undo phase has passed. Therefore, Microsoft recommends that the best practice is to use the NORECOVERY option without the undo phase. The undo phase is performed separately after all backups have been restored, which can be done by specifying the recovery option when the tail of the log is restored.

Summary

This paper briefly introduces the function of log and some concepts of data recovery under the full recovery model. Understanding the concept of the full recovery model is irreplaceable for reducing the risk of data loss.

Go to an article about the SQL Server three recovery model, written from the SQL Server mechanism, feels good, turns

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.