This article is the second in a series of articles in case you haven't read the first article. The previous article address is as follows:
On transaction log in SQL Server (i)----the physical and logical architecture of transaction logs
Introduction
Each SQL Server database records the corresponding log to the log file in the order in which it modifies data (insert,update,delete). SQL Server uses Write-ahead logging technology to ensure the atomicity and durability of transaction logs. This technology not only guarantees atomicity (A) and persistence (D) in acid, but also greatly reduces IO operations, The work of submitting changes to the data to disk is given to Lazy-writer and checkpoint. This article focuses on the process of SQL Server modifying data and related technologies.
Pre-write log (Write-ahead Logging (WAL))
SQL Server uses Wal to ensure the atomicity and durability of transactions. In fact, not only SQL Server, basically the mainstream relational database including ORACLE,MYSQL,DB2 all use the Wal technology.
Wal's core idea is to write to the log before the data is written to the database.
Because every modification to the data is recorded in the log, it does not make much sense to write changes to the data in real time to the disk, even when SQL Server crashes unexpectedly, the data that is written to the disk during recovery (recovery) is rolled back ( RollBack), and data that should be written to the disk but not written will be re-made (Redo). This ensures durability (durability)
But Wal is not just a guarantee of atomicity and durability. Performance is also improved.
The hard disk is rotated to read the data through the Wal technology, each commit to modify the data transaction is not immediately reflected in the database, but first logged to the log. Submitted in a subsequent checkpoint and lazy writer, if no Wal technology is required to write to the database each time the data is submitted :
Using the Wal merge write will significantly reduce disk IO:
You may have questions about how the log file will be written to the modified data every time. Disk IO is also consumed. As mentioned in the previous article, each record written to the log is written in chronological order with the LSN of the given sequential number, and the log is written only to the logical end of the log file. Rather than the data, it may be written to various parts of the disk. Therefore, the cost of writing to the log is much smaller than the cost of writing the data.
steps for SQL Server to modify data
SQL Server changes to the data are performed in the following sequence of steps:
1. Write a "Begin Tran" record in the log of the SQL Server's buffer
2. Write the information to be modified on the log page of the SQL Server's buffer
3. Write data to the data page in the buffer of SQL Server that will be modified
4. Write a "Commit" record in the log of the SQL Server's buffer
5. Write the log of the buffer to the log file
6. Send acknowledgement (ACK) to client (SMSS,ODBC, etc.)
As you can see, the transaction log is not written to disk in a single step. Instead, write the log to disk one time after the buffer is written. This reduces the IO and the sequence of log LSN, which can be written to disk in the log.
The above steps show that even though the transaction has reached the commit stage, it simply writes the log page of the buffer to the log, and does not write the data to the database. When is the data page that will be modified written to the database?
Lazy writer and checkpoint
As mentioned above, SQL Server's steps to modify data do not contain the actual process of writing data to disk. In fact, writing a page in a buffer to disk is an implementation through two procedures:
The two processes are:
1.CheckPoint
2.Lazy Writer
Any pages that are modified in the buffer are marked as dirty pages. Writing this dirty page to the data disk is the work of checkpoint or lazy writer.
When a transaction encounters a commit, it simply writes all the log pages of the buffer to the log file on disk:
The data page of the buffer is actually written to the disk file until lazy writer or checkpoint:
As mentioned earlier, the LSN number in the log file can be compared, and if LSN2>LSN1, it indicates that the LSN2 occurred later than the time of the LSN1. Checkpoint or lazy writer compares the LSN number at the end of the log file to the LSN of the data file in the buffer, and only data in the buffer that is less than the LSN number at the end of the log file is written to the database on disk. This ensures that the Wal (writes the log before the data is written to the database).
the difference between Lazy writer and checkpoint
Lazy writers and checkpoint are often easily confused. Because both lazy writer and checkpoint write the "dirty" page in the buffer to the disk file. But it's only the same thing they do.
The purpose of Lazy Writer's existence is to manage buffers. When a buffer reaches a critical value, Lazy writer stores the dirty pages in the buffer in a disk file, freeing the unmodified pages and reclaiming the resources.
The meaning of checkpoint's existence is to reduce the server's recovery time (Recovery). Checkpoint, as his name indicates, is an archive point. Checkpoint will occur on a regular basis. To write a "dirty" page in the buffer to disk. But unlike lazy writer,checkpoint, there is no interest in memory management for SQL Server. So checkpoint means that all the changes that have been made before this point have been saved to the disk. Note that the checkpoint will write the dirty pages of all buffers to disk, regardless of whether the data in the dirty pages has been commit. This means that "dirty pages" that may have been written to the disk will be rolled back later (RollBack). But don't worry, if the data is rolled back, SQL Server modifies the page in the buffer and writes to the disk.
Through the operation mechanism of checkpoint, it can be seen that checkpoint intermittent (Recovery Interval) length may affect the performance. This checkpoint interval is a server-level parameter. Can be configured via Sp_config or in SSMS:
The default parameter for the recovery interval is 0, which means that SQL Server manages the reply intervals. Setting the recovery interval yourself also needs to be defined according to the specific circumstances. Shorter recovery intervals mean shorter recovery times and more disk IO, while longer recovery intervals result in less disk IO consumption and longer recovery times.
In addition to automatic checkpoint, checkpoint also occurs when the ALTER DATABASE and the SQL Server server are shut down. Members of the sysadmin and db_backupoperator groups as well as db_owner can also use the checkpoint directive to manually save checkpoint:
By specifying the parameters after checkpoint, SQL Server will complete the checkpoint process at this time, and SQL Server will use more resources to prioritize the checkpoint process if the time specified is short.
Typically, the "dirty" page is written to disk, and Lazy writer has a lot more to do than checkpoint.
Talking about transaction log in SQL Server (ii)----The role of the transaction log when modifying data