Deep understanding of DiskStorage)

Last Update:2018-06-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The database management system stores data on disks, tapes, and other bare devices. Although the access speed of these devices is much slower than that of the memory, however, their non-volatile and high-capacity features make them the best choice for data storage.

The database management system stores data on disks, tapes, and other bare devices. Although the access speed of these devices is much slower than that of the memory, however, their non-volatile and high capacity make them the best choice for data storage.

This article mainly discusses the internal structure of disk storage for large database products, which plays a vital role in understanding various database data structures.

Architecture of database disk storage

The above two figures show the hierarchical structure of the storage and the physical structure of the disk, which is not the focus of this article. It should be emphasized that the time for a complete input/output (IO) operation = the disk axis rotation time (rotation delay), disk arm movement time (seek time), and data transmission time. The average experience values of the three parameters are 0.004 seconds, 0.008 seconds, and 0.0005 seconds. Therefore, the experience of a complete IO time is 0.0125 seconds, that is, 1/80 seconds.

For large databases, even for the extremely short 0.0125 seconds, frequent IO operations will accumulate this insignificant amount of time, therefore, the optimization of disk storage is essential and important for improving database efficiency. The disk storage implementation of different database products is different. This article only discusses Oracle and DB2 large database products. Although the two are different in details, the structure is basically the same.

Storage Models for Oracle and DB2 Databases

We can see that the data in the database is stored in the tablespace. A tablespace is a database object that is managed to map the logical database design to the physical storage of the operating system. It is used to specify the physical location of the data. We will discuss the tablespace later.

The logical structure of Oracle Database disk storage is: one Database corresponds to multiple tablespaces and one Tablespace corresponds to multiple segments ), one segment corresponds to multiple zones (Extent), and one partition corresponds to multiple Data blocks. The real Data is stored in the Data blocks. Here are the following notes:

1. the default size of a data block in Oracle is 2 kb (2 kb, 4 kb, 8 KB, 16 KB, 32 KB), and 4 kb (4 kb, 8 KB, 16 KB, and 32 KB );

2. oracle has the Segment concept, but DB2 does not. tablespaces are directly composed of partitions in various containers (data files, however, there is still a very relaxed Extent group concept. The Segment content mentioned below is all for Oracle;

3. Data blocks in Oracle are called Oracle blocks, while Data pages are directly called Data pages in DB2 ).

4. Extent in Oracle is called a partition, while DB2 is called an extended data block. To facilitate reading, this document is generally called a zone.

Logical Structure of Oracle disk storage

Logical Structure of DB2 disk storage

Note: The tablespaces, segments, zones, and data blocks (pages) here are all logical concepts in the database and are not physically present, how does the logical structure of database disk storage map to the physical structure of the operating system disk storage?

Let's take a look at the physical ing of the tablespace. Tablespaces are carried by containers in the operating system. For System Management tablespaces (SMS), note: Oracle does not have System Management tablespaces, and all tablespaces are managed by databases ], its only container is the file directory, while for database management tablespace, its container is a file or bare device (such as tape or Cabinet ). Here we will not discuss the system management tablespace, but only focus on the database disk storage under the database management tablespace. Because the database treats files and bare devices in the same way, we use file containers as an example. A file container is a pre-allocated file used by the DMS tablespace. A tablespace can have multiple file containers, that is, multiple data files. That is to say, a logical tablespace is mapped to multiple physical data files.

Next we will discuss the physical ing of data blocks (pages. As we know, in terms of physical structure, files in the operating system are composed of blocks of multiple operating systems (for Linux, the Block size of Unix systems is B, in Windows, the block size is 1 kb. As mentioned above, the data in the database is stored in units of data blocks (pages). Therefore, the data blocks in the database (pages) what is the relationship between the block and the operating system? In fact, several operating system blocks constitute a database data block (page ). The exting between the corresponding area (extent) and segment (segment) does not correspond to the physical unit, but contains multiple segments in a data file, each segment contains multiple partitions (the number of partitions in each segment is not necessarily the same; data files in DB2 are directly in the unit of zone, and there is no segment concept ), each partition contains several data blocks (the number of data blocks in each partition is not necessarily the same ). In addition, like a tablespace, segments can also be cross-file, that is, a segment can be composed of partitions in different files.

The logical/physical ing diagram of the database disk storage is as follows (DB2 does not have the Segment layer ):

We will discuss these logical units later. Now you only need to understand the general architecture. From the above introduction, we can see that the tablespace managed by the database is self-contained and platform-independent. It is a small independent file system.

Internal Implementation of database disk storage

After learning about the architecture of disk storage, let's take a look at how data is stored in the database system.

When creating a Table, a tablespace is specified for the Table. Once the Table is created successfully, the database system needs to provide disk space for the Table.

The Oracle database automatically assigns a Segment (Segment) to a table. This Segment is called Data Segment (Data Segment). [Note: Only one Data Segment can be allocated to a table. Oracle has four types of segments: Data Segment, Index Segment, Rollback Segment, and Temp Segment )]. When all the data segments allocated to the table are filled, the database management system adds a new partition (Extent) for the data segment, that is, after the data segment space is allocated, you do not need to add much space for the segment, nor add data blocks directly in the partition, instead, an Extent is added at one time (this avoids frequent Segment resizing). Extent is the smallest unit of space allocation, and Extent is evenly allocated to containers in the tablespace. In addition, the data block (PAGE) is the smallest storage unit, that is, the smallest I/O unit. All data is stored by block (page, when reading data, the entire data block (PAGE) is directly read into the memory. As for DB2, the solution is basically the same as that for Oracle, but the difference is that there is no Segment allocation.

Let's take DB2 as an example to see how data storage works. [Refer to niuxinzhuang: in-depth analysis of DB2]

A dms tablespace can have multiple containers (files or bare devices). DB2 writes Extent evenly to each container. That is, when you need to request an Extent, the Database Manager Extent is allocated to the next container. This solution ensures balanced utilization of containers and improves parallel access efficiency. As shown in the left figure:

Balanced write of tablespace containers

Relationship between tablespace containers, Extent, data pages, and tablespaces

Create a new DMS tablespace called HUMANRES. Its Extent size is 2 pages (that is, data blocks), and each Page size is 4 kb. There are four containers in the tablespace, and each container has a small number of allocated Extent: the DEPARTMENT and EMPLOYEE tables in the tablespace occupy 7 pages. According to the allocation rules described above, the data in the two tables will be written to four different containers (four Extent needs to be allocated for 7 pages, and each Extent is allocated to each container in turn). In addition, an Extent can only be written by one table. Even if the table data cannot fully use the space in Extent, the free space in Extent can only be empty, you cannot continue writing data to other tables (this is determined by the minimum space allocation unit of Extent ). See the picture on the right.

Data Block (PAGE) Storage

Next, we will further analyze the specific structure of the Data Block or Data Page, the smallest unit of Data storage in the database disk. The data block structure of Oracle and DB2 is different. The following are discussed respectively. Other database products are not discussed.

Data Block structure and related features of Oracle databases

(Reference: http://docs.oracle.com/cd/B28359_01/server.111/b28318/logical.htm)

Data block format

The format of Oracle data blocks is similar to that of tables, indexes, and clusters,

Common and Variable Header)

The header contains common block information, such as the block address and segment type (such as table or index)

Table Directory)

This part of information contains information about the table or the index in this block.

Row Directory)

This part contains information about the actual rows in the data block (including the addresses of each row in the row data area). Once the space of this row directory in the data block header is allocated, this space cannot be recycled even if the row is deleted.

Therefore, a data block that is currently empty contains the information of the data rows stored in the data block (each data row segment (row piece) in the row data area) ). [A data block may save a complete data row or only a part of the data row. Therefore, we use row piece]. The row directory space will be reused only when new data is inserted into the data block.

Header information area (Overhead)

Header (header/Common and Variable), Table Directory (Table Directory), Row Directory (Row Directory), these three parts are collectively referred to as the header information area (Overhead ). The header information area does not store data. It stores the information of the entire block. The size of the header information area is variable. Generally, the header information area is between 84 bytes (bytes) and 107 bytes (bytes.

Free Space)

The available space is an unused area in a block. This area is used for inserting new rows and updating existing rows. The available space also contains transaction entries. When each insert, update, delete, select .. when a for update statement accesses one or more rows of data in a block, a transaction entry is requested. The request space of the transaction entry is related to the operating system, and about 23 bytes are required in most operating systems.

Row Data)

This data block contains table or index data, and the row may also be cross-data block, which is the phenomenon of row migration.

Free Space Management)

The available space may be automatically or manually managed.

The available space is automatically managed by the segments in Oracle, and the available/used space in the segments is tracked by bitmap. Automatic segment space management provides the following benefits:

L easy to use

L better space utilization, especially for objects with large changes in row size

L Dynamic Adjustment of concurrent access

L balance of performance and Space

Use and compression of available data blocks

Delete and update (decrease the original value) increase the available space of data blocks. In the following cases, the insert statement can effectively use the space that has been released.

If the insert statement is in the same transaction, and the statement before the insert statement just releases the corresponding space, the insert statement can use this space.

If the insert statement and the statement releasing space are not in the same transaction, the insert statement can use the space only when other transactions are committed and need space.

The released space may not be consecutive. Only when both of the following conditions are met will Oracle merge and compress the available space of data blocks.

1. When an insert or update statement tries to use sufficient space to create a new row;

2. When free space is scattered and cannot be inserted into the adjacent space;

This is the so-called Block Reorganization )". In fact, Oracle and DB2 adopt the same policy in dealing with the shard space, that is, after row data is deleted, the space is vacant and is reorganized when the block space is insufficient. The policy of other databases is that after the row data is deleted, the data of other rows is migrated to ensure that the available space is continuous. Obviously, this approach affects database efficiency.

Row Chaining and Migrating)

Row Chaining: If we INSERT a Row of data into the database, this Row of data is so large that a data block cannot store a whole Row, oracle Splits a Row of data into several data blocks. This process is called Row Chaining ).

If a row of data is a normal row, this row of data can be stored in one data block; if a row of data is a link row, this row of data is stored in multiple data blocks.

A row link is also called a row-to-page Link. Oracle allows row-to-page links, but DB2 does not.

Row Migrating: there is a record in the data block. The user executes the UPDATE operation to UPDATE this record. In this case, Oracle searches for the record in this data block, however, Oracle cannot find the space that can accommodate this record. However, Oracle can only move the entire row of data to a new data block. The original data block retains a "Pointer", which points to the new data block. The rowid of the moved record remains unchanged.

PCTFREE and PCTUSED

The PCTFREE parameter is used to specify the minimum percentage of free space that must be retained in a block. This space is reserved because it is required during UPDATE. If there is no free space during UPDATE, Oracle will allocate a new block, which will generate Row migration (Row Migrating). Reserved space can ensure database access efficiency to a certain extent.

PCTUSED is also used to set a percentage. When the percentage of space used in the block is smaller than this percentage, the block is marked as valid. Only valid blocks are allowed to insert data.

PCTFREE and PCTUSED have the following functions:

Data Page Structure and related features of DB2 database

(Reference: http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/index.jsp? Topicture%2fcom.ibm.db2.lu%admin.perf.doc%2fdoc%2fc0007337.html)

Data page format

In a standard table, data is logically organized by the list of data pages. These data pages are logically grouped together based on the Extent size of the tablespace. This Extent group is similar to the Segment in Oracle, but the Extent group does not have the forced concept like Segment.

Each data page has the same format. The front of each page is the page header, followed by the Slot directory, followed by available space and records.

Header)

It is used to store BPS headers and information fields on some pages. BPS headers occupy 48 bytes, and the entire page HEADER occupies about 91 bytes.

Slot Directory)

The slot directory is similar to the row directory in the Oracle data block. Each entry in the slot directory corresponds to a record on this page. Entries in the slot directory indicate the byte offset of the record's start position on the data page. The entry with a value of-1 corresponds to the deleted record.

Free Space)

The DB2 available space strictly includes the general available space in the general sense and the embedded available space. The general available space can be used to directly store data, while the embedded available space is usually the shard space generated after the deletion, and cannot be used directly, it can be used only when the page is reorganized and merged to the general available space. This is similar to Oracle.

Record)

The data storage space, similar to the row data in Oracle. That is, the rows in the table are stored on the page as records. The number of records (that is, the number of rows) on each data page varies depending on the size of the data page and the record size. Most pages only contain user records (common database data ). However, a few pages contain special internal records used by the data server to manage tables. For example, in a standard table, each 500 data pages has an available space control record (FSCR ). These records are used to map the available space available for new records per 500 data pages until the next FSCR. In addition, before DB2 V9, each data page can store a maximum of 255 records. After DB2 V9, each data page can theoretically store 65000 records (related to the RID, not discussed for the moment ), however, due to the limited data page size, each data page can store more than 2300 records.

Row link and row migration

DB2 does not allow cross-page rows (records), that is, row links are not allowed. However, row migration is allowed (also known as row overflow in DB2 ),

When a table has a variable-length data type, it is prone to row overflow. Row overflow is also called row migration. It indicates that when we update fields of the variable-length data type, because the updated data is longer than the original data, the current data page cannot store the entire row of data, you need to store this row of data to a new data page, in the original data page memory, place the pointer at the new position of the row to point to the position of the new row, and the rid of the moved data remains unchanged (discussed later. Apparently, if row migration occurs on a large scale, it will inevitably have a serious impact on the database access efficiency (I/O increases significantly). If a large number of row migration phenomena are found in the table, we recommend that you REORG the data storage location to eliminate row overflow.

Another point that needs to be emphasized on the corresponding DB2 data page is that the data page of DB2, like the data block of Oracle, also has the concept of PCTFREE and PCTUSED. Its meaning and function are also roughly the same.

In addition, we can see that the data block structure of Oracle is very similar to that of DB2. Another point is that the row data of both is written from high to low, while the row directory is written from low to high. Why? Because data insertion is a process of writing at both ends, you must write data into the available space as row data or records, and update the row directory or slot directory behind the header, if the same-side write is not conducive to the expansion of space. Is a histogram:

Data row Structure

Now I have a rough understanding of the specific structure of the data block (PAGE), so how does one row of data in the table be stored in the data block (PAGE) in the form of row data (record) in? This problem seems simple. In fact, the structure of a Data Row (record, that is, the Row in) is very complex. Here we take the structure of the data Row in DB2 as an example.

The parameters are described as follows:

As you can see, when a variable-length Column exists in the tuples (that is, the rows in the table), no matter where the column is located, it is moved to the last storage in the Data row (record, the fixed-length columns are stored sequentially.

[Note: The data type is specified for Attribute columns of a table. Some data types become longer. For example, the varchar size varies with the actual value, some columns have a fixed data type length. For example, if the int value is 32 bits, the actual value of the physical column occupies 4 bytes. If a variable-length data type column appears in the column, the variable-length tuples are generated .]

It is the meaning of each bit of State Bit A (1 byte = 8 bits), and state bit B is not used.

Let's take a look at how a fixed-length tuples (rows) are stored in data rows (Records:

It is a bit complicated. Compare the figure with the structure shown above. [Note: All data is stored in hexadecimal format in the data rows (Records) on the data page and in reverse order]

The following describes the storage methods of variable-length tuples (rows) in data rows (records), which are similar to the storage methods of fixed-length tuples:

For tuples containing the big object data type, the big object data is not stored together with the data row, but is divided into different pages of the database with the data row open, in this tuples (rows) to store a pointer to this large object.

The above is the general content of the internal structure of the database system disk storage.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Deep understanding of DiskStorage)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Deep understanding of DiskStorage)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support