MySQL InnoDB data Salvage (i) INNODB page structure features

Source: Internet
Author: User

If the file system is damaged or accidentally deleted the database files, as long as the disk space is not overwritten, the data is still in the disk sector, or can be recovered, some common file recovery tool seems to be able to recover files, but this is to be studied when the Universal File Recovery tool fails.

InnoDB file is saved by page, which provides a very advantageous condition for salvage work, the page has some characteristics, we can according to page features to extract data pages from the disk, that is, data salvage.

InnoDB The concept of table space:
A table space, a collection of data files, in InnoDB is the IDB file collection.
A table space can consist of multiple IDB files;
InnoDB the data file every 16k (starting from 0) as the page number of the page, and the number of adjacent files is also coherent;
InnoDB uses a tablespace ID to differentiate between table spaces, the ID of the shared tablespace is always 0, and the IDs of the other stand-alone table spaces are incremented sequentially. So, the unique identity of a page is <strong> (space_id, Page_no);</strong>
(PS: I scanned the ibdata1 on my Computer yesterday)

First, we need to know the characteristics of each part of the InnoDB file page structure and structure;
Page overall structure, image source Network (http://www.cnblogs.com/vinchen/archive/2012/09/10/2679478.html)

<span style= "Font-size:x-large;" ></span>"

1. Page Header: Record the control information of the page, accounting for 150 bytes, including the page's left and right Brother page pointers, page space usage, etc.

2. Minimum virtual record, maximum virtual record: two virtual records stored in fixed locations, which do not store data by themselves. The minimum virtual record is smaller than any record, and the maximum virtual record is larger than any record.

3. Recording heap (record heap): Refers to the orange color portion. Represents the record space that the page has allocated and is also the true storage area for index data. The record heap is divided into two types, valid records and deleted records. A valid record is a record of normal use of the index, but deleting a record indicates that the index has been deleted, not the used record, such as the dark blue part. As records are updated and deleted more and more frequently, the more records that are deleted in the record heap, the more holes (fragments) appear. These deleted records are linked together and become the free-space linked list of the page.

4. Unallocated space: refers to the unused storage space of the page, the unallocated space will be smaller as the page is used continuously. When a new record is inserted, the first attempt is to obtain the appropriate storage location from the free-space list (enough space), and if not, it is requested in unallocated space.

5. Slot Area: Slots are pointers to valid records for some pages, each slot occupies two bytes, and the offset of the record relative to the first address of the page is stored. If the page has n valid records, then the number of slots is between n/8+2~n/4+2. The next section details the slot area, which is the key to documenting the ordered and binary lookups of the page
6. Footer (page tailer): The last part of the page, which occupies 8 bytes, mainly stores the check information of the page.

The page header, maximum/minimum virtual record, and footer are all fixed storage locations on the page.
<div> features are mainly in the page header, but the File Data section also has some features, the end of the file may also be a feature;</div>
Page header structure and characteristics: (Divided into general headers and different page types of headers)
* General Page Header: Represents the page header used by all pages, accounting for 38 bytes;
* Data header: The next 112 bytes is the data page header, which represents the header information of the data page, (depending on the different page types in the general header here are different headers, but we salvage the data more concerned with the data page header, because the data is saved in the data page, that is, the page type is Fil_page_index)
The total size of the page header of the data page is 38+112=150 bytes;

General headers: (the definition of the common header in the source code in the file: file0fil.h)

The generic page header occupies 38 bytes, in order.
* Page Checksum value (4 bytes): The checksum value of the page content, used to verify the legality and reliability of the content of the page,
(If the algorithm can be found, the data salvage when the Judgment page will be a very effective field)

• Page number (4 bytes): Page number in table space
• Page number of the left Brother page (4 bytes): The first section of this article describes the basic content of the B + tree, referring to the same hierarchy of pages connected by a doubly linked list. The left Brother page number is the left sibling page number in the linked list of the page. Of course, the left and right sibling pages must belong to the same table space.
• Page number of the Right Brother page (4 bytes): Ibid., right brother page number.

* Page LSN (8 bytes): Last modified LSN value for page brush and restore. The LSN is an incremented log sequence number.
* Page type (2 bytes): InnoDB There are several types of pages, which can be distinguished by this, the data page page type is Fil_page_index (17855).
(17855 is very interesting, is simply for data salvage preparation, the data are stored in this person H type of page, we want this type of page, the theory only by this flag can let the probability of garbage block appear small to 1/64k, according to the cluster size 4K count, A garbage block will appear on every 256G disk)

• File Brush disk LSN (8 bytes): Only the shared tablespace is used on the first page of each file, the LSN value of the server's normal end is recorded, and is generally used only for checking and verifying
If so, that is, the normal data page is not necessary, then this is a person H logo)
• Tablespace ID (4 bytes): Tablespace ID (if the table space starts at 0, then the normal database tablespace ID will not be large in Y, which may be a judgment.) )
2.2 Data Page Header
The data header is a total of 112 bytes, the members are more and more complex, from low to high byte, including members:

Page_n_dir_slots (2 bytes): Refers to the number of slots in the slot area, two bytes per slot.
Page_heap_top (2 bytes): Heap-top pointer, unallocated space's first address
(if it is relative to the page position, then it should be behind the page header)

Page_n_heap (2 bytes): Records the number of records in the heap, including deleted records and maximum minimum virtual records. Therefore, it is initialized to 2. The 15th bit is 1 for row_format=compact

Page_free (2 bytes): The first deleted record offset. All deleted records are joined together to become free-space linked lists.
(again offset, can judge it)

Page_garbage (2 bytes): The total number of bytes that have been deleted for a record, that is, the total amount of space that has been deleted in the heap.
Page_last_insert (2 bytes): Offset of last inserted record
Page_direction (2 bytes): The last insertion direction of the page, if the insertion is greater than the last inserted value is Page_right, and vice versa is page_left;

Page_n_direction (2 bytes): Number of consecutive insertions in the same insertion direction
Page_n_recs (2 bytes): Page valid Record Count
page_max_trx_id (8 bytes): The last time the transaction ID of the page is changed, only valid in the Level two index, for the two-level index record MVCC multi-version visibility judgment.

Page_level (2 bytes): The level of the page in the index, the leaf node level is 0.
page_index_id (8 bytes): The ID of the index to which the page belongs.
(This is important in data recovery)
Page_btr_seg_leaf (10 bytes): Leaf node Segment Header inode information, valid only on the root page of B + Tree
Page_btr_seg_top (10 bytes): Inner node Segment Header inode information, valid only on the root page of B + Tree
Data page header members are more, Page_n_dir_slots, Page_n_heap, Page_n_recs is relatively simple, mainly

4. End of page
The footer is the last 8 bytes of the page, consisting of two parts, primarily for the validation of page content, including:

Old_chksum:4 bytes, the checksum value of the footer, uses a different algorithm than the page checksum value of the generic page header.
Lsn_lower_4bytes:4 byte, which records the low four bytes of the page lsn in the generic header.
The page checksum value, page LSN, and footer Old_chksum, lsn_lower_4bytes can determine whether a page is corrupted, and the algorithm is (see function buf_page_is_corrupted):

1. Determine if Lsn_lower_4bytes equals the low four bytes of the page LSN, and returns true if not equal.

For page counts, these values can be affected by each page insert, update, or delete record. Other headers can be used to describe their key role in some specific operations.

MySQL InnoDB data Salvage (i) INNODB page structure features

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.