Partial write and doublewrite

Source: Internet
Author: User

MySQL double write (secondary write) is an important feature of the MySQL InnoDB Storage engine. I have read the relevant materials over the past two days and combined my existing knowledge, let's talk about your understanding of double write for your reference.

Partial write)

Double write technology InnoDB is born to solve the partial write problem. The so-called page breaking is when the database goes down (OS restarts, or the host powers down and restarts ), the database page is only partially written to the disk, resulting in page inconsistencies. The basic unit for reading and writing a database, OS, and disk is block size. We know that the database block is generally 8 K, 16 K; the OS block is generally 4 K; the IO block is smaller, and the Linux kernel requires Io block size <= OS block size. In addition to the I/O block size, I/o block also has the concept of Sector (IO sector), which is the basic unit for Disk Physical Operations, and I/O block is the logical unit for disk operations, an IO block corresponds to one or more sectors. The size of the sector is generally 512 bytes. The relationship between the block sizes can be summarized as follows:

Db block> OS block> = Io block> disk sector, and they maintain an integer multiple. For example, the size of each block in my system is as follows, DB uses MySQL as an example, and OS uses Linux as an example.

DB BLOCK SIZE

[Email protected] (none) 09:13:02> show variables like 'innodb _ page_size ';

+ ------------------ + ------- +

| Variable_name | value |

+ ------------------ + ------- +

| Innodb_page_size | 16384 |

+ ------------------ + ------- +

OS block size

[1 single: Check: MySQL ~ ]

$ Getconf pagesize

4096

Io block size

For example, you can view the io block size of the sdb1 partition.

[[Email protected] # blockdev -- getbsz/dev/sdb1

4096

Sector size

[[Email protected] # fdisk-L | grep sector

Sector size (logical/physical): 512 bytes/512 bytes

From the above result, we can see dB page = 4 * OS page = Io page = 8 * sector size.

Any dB page writes will eventually be converted to the sector write. If an exception occurs during the disk write process, restart, a DB page may only write some sector to the disk, and the page breaks.

Page breaking and Database Consistency

As we have analyzed above, abnormal restart will surely lead to page breaking. Page breaking means that the database page is incomplete. Does the database page being incomplete mean that the database is inconsistent ??? We know that there is an exception recovery mechanism when the database is restarted abnormally. I am not going to talk about the exception recovery mechanism here, because the exception recovery process for different databases is different. The basic principles of mainstream databases are similar: redo logs are redone in the first stage to restore the data page and undo page to the state when the crash is abnormal. In the second stage, according to the content of the Undo page, rollback does not commit modifications to the transaction. The Database Consistency is ensured through two stages. For MySQL, if a page break occurs in the first stage, the redo log cannot be restored, resulting in restoration interruption and Database Inconsistency. You may be wondering if the redo of the database records all the changes and is physical? Theoretically, no matter whether the page is broken or not, redo is redone at the redo position corresponding to the previous checkpoint, and the page can be restored to normal. Right? To clarify this problem, let's first talk about the redo log format.

Redo log format

The database system implements three main log formats: Logical logging, physical logging, and physiologging. For redo logs, physical logs and logical logs are used. Logical logs record logical operations and do not involve physical storage location information, such as BINLOG of MySQL. Physical logs record operations at specific physical locations, such as table space 2, file 1, 233 on 48 pages, this offset location is written into 8 bytes of data, through (group_id, file_id, page_no, offset) 4 tuples, you can only determine the physical location where data is stored on the disk. Physical logical logs are a mixture of physical logs and logical logs. If a database operation (DDL, DML, DCL) logs generated across multiple pages will generate logs for multiple physical pages. However, the logs recorded in each physical page are logical information. Here I will illustrate several log forms with a simple insert operation.

For example, InnoDB table t (C1, C2, key key_c1 (C1), insert record row1 (1, 'abc ')

Logical log:

<Insert op, T, 1, 'abc'>

Logical physical log:

Because table t contains the index key_c1, One insert operation involves at least two B-Tree operations, and the second B-tree must involve at least two physical pages. Therefore, there must be at least two logs.

<Insert op, page_no_1, log_body>

<Insert op, page_no_2, log_body>

Physical log:

Because of an insert operation, you must physically modify the page header information (for example, add 1 to the number of records in the page) and modify the linked list pointer in the adjacent records, to modify slot attributes, N physical logs are generated for each log corresponding to logical physical logs.

<Group_id, file_id, page_no, offset1, value1>

<Group_id, file_id, page_no, offset2, value2>

......

<Group_id, file_id, page_no, offsetn, valuen>

 

Therefore, the insert operation will generate a logical log, two logical physical logs, and two * n physical logs. From the above simple analysis, we can see that the logical log has the smallest log volume, while the physical log has the largest log volume. The physical log is pure physical, while the logical physical log is physical between pages, in-page logic, such as physical-to-a-page and logical-within-a-page.

Redo format and Data Consistency

Return to the "whether Database Consistency will be affected after a Page Break" issue. After a page break occurs, databases that use pure physical logs for Redo are not affected, because each redo log does not depend on the state of the physical page and is idempotent (the result is the same if it is executed once and N times ). Note that the page size of the redo log is generally 512 bytes, so the redo log page itself will not break down. Logical physical logs do not work, such as modifying the page header information, adding 1 to the number of records on the page, and modifying the slot information depend on the page in a consistent state. Otherwise, redo cannot be correctly redone. MySQL uses this log type, so when a page break occurs, exception recovery may occur. You need to use double write technology to assist in processing.

Double write processing page broken

Doublewrite is a buffer allocated within the InnoDB tablespace. Generally, double write contains 128 pages. For pages with a pagesize of 16 K, the total size is 2 MB. The doublewrite page has the same physical storage space as the data page, exists in the shared tablespace. InnoDB writes data pages in the buffer zone by writing multiple pages at a time, so that multiple pages can be written to the doublewrite buffer in sequence and fsync () is called () ensure that the data is written to the disk, and then the data page is written to their actual storage location and fsync () is called again (). When the fault is restored, InnoDB checks the content of the doublewrite buffer and the original storage location of the data page. If the doublewrite page is in the page breaking state, it simply discards it. If the data page is inconsistent, it will be restored from the doublewrite page. Because the doublewrite page and data page are stored at different time points, the doublewrite page and data page do not break at the same time, so the doublewrite technology can solve the page fracture problem, this ensures that the redo log can run smoothly and the database can be restored to a consistent state.

How Does Oracle handle page breaks?

Like MySQL, Oracle uses redo logs in a logical physical format, but there is no doublewrite technology. I have always been thinking about Oracle's awesome database. It must have its own way to deal with this problem. Perhaps this is the so-called manmu worship. After searching for N long-term documents, including Chinese and English documents, he Deng chengshen was consulted to draw a conclusion that Oracle encountered a Page Break Problem, will not be restarted. However, Oracle has a relatively simple policy to restore the database to a consistent state, backup + archiving logs. Backup ensures that the data page is not broken, and the incremental archiving logs can be restored to a certain time point. Why not? I think Oracle generally uses dataguard for Disaster Tolerance. When the master database fails, the slave database will assume the responsibility of the master database, and then the master database will be recovered through backup + archiving logs, although it is not as fast as doublewrite technology, it can be recovered.

Other methods to solve page Breakage

The previous discussions are based on a hypothesis that page breaks will occur after abnormal restart. In fact, the underlying infrastructure can solve this problem to some extent, such as at the file system level, using a ZFS file system, ZFS ensures the integrity of the OS page by means of logs and prevents page breakage at the underlying layer. On the disk layer, raid cards generally have live cache, even if the OS restarts abnormally, the cached data will not be lost immediately. Therefore, the partial write issue can be avoided. However, I am thinking that the pagesize of the OS is smaller than the pagesize of the DB. Even if the OS page does not break the page, it cannot be guaranteed that the DB page does not break. I personally feel that it cannot be completely solved. Of course, if you set the DB pagesize to the same size as the OS pagesize, it will be okay.

 

References

Http://blog.csdn.net/oneyearlater/article/details/7720723

Http://www.percona.com/blog/2006/08/04/innodb-double-write/

Http://blog.itpub.net/22664653/viewspace-1140915? Page = 2

Http://www.vmcd.org/2014/09/1748/

 

Partial write and doublewrite

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.