Storage layout of databases on disks: HeapFile and heapfile

Source: Internet
Author: User

Storage layout of databases on disks: HeapFile and heapfile

---- Reading Notes on "large-scale distributed storage system: Principle Analysis and Architecture Practice"

This article is still a topic outside of the book "large-scale distributed storage system: Principle Analysis and architecture practice. After learning this book, I learned about the Distributed Key-value system. Generally, SSTable (an unordered key-value pair set container) is used as the layout on its disk. This is reminiscent. What storage layout does traditional databases use to store data? This is the topic to be discussed today-HeapFile.

What is HeapFile?

HeapFile is a data structure that stores Page data. Similar to a linked list, HeapFile is also a disordered container.
Both HeapFile and SSTable have special structures.. Since all data is saved, why not use files directly? Because system files do not distinguish the content of files. The processing granularity is large. Both HeapFile and SSTable can provide record-level management. In this regard, both functions are the same and provide more fine-grained storage management for the system.

Basically, traditional databases such as Oracle, MySql, PostgreSql, and SQLServer all use HeapFile for storage layout management. Like SSTable, The HeapFile structure is actually very simple, but you need to always know that the storage in the database uses HeapFile.

We all know that databases usually use the B + tree as the index, but few people in China mention that databases use HeapFile to manage the storage of records. Some foreign universities usually allow students to implement a simple database in the course of "Database System implementation". Therefore, there are a lot of HeapFile materials.

Page-based HeapFile

In the form of a linked list, HeapFile is as follows:

The Heap file and linked list structure are similar:

  • Added (append) Features
  • Supports large-scale sequential scanning
  • Random Access is not supported.

In this way, HeapFile needs to traverse multiple pages when searching for a half-empty Page with suitable space, and the I/O overhead is high. Therefore, the index-based HeaFile is usually used. In HeapFile, a part of space is used to store the Page as the index, and the remaining amount of the corresponding Page is recorded. As follows:

As in this case, the index exists on a single page. Data records exist on other pages. If there are multiple indexed pages, they can be expressed:

The following are some features of Heap file:

  • Data is stored in a second-level storage volume (disk): Heapfile is designed to efficiently store large amounts of data. The size of data is limited only by the storage volume;

  • Heapfile can span multiple disk spaces or machines: heapfile can use a large address structure to identify multiple disks or even networks;

  • Data is organized into pages;

  • The page may be partially blank (not requiring every page to be filled );

Pages can be divided into different physical areas of a storage body, and can be distributed across different storage bodies, or even on different network nodes. We can simply assume that each page has a unique address identifier PageAddress, and the operating system can locate the Page Based on PageAddress.

Generally, you can use the offset of the page in the file where it is located.

A simple layout Implementation Scheme File Layout

To simplify data layout in a file, I made a simple Convention: a file represents a link.
This means that the number of records of a link is limited by the file system. If it is a FAT32-bit system, a file can only be 4 GB at the maximum. If it is a common etx3, A single file is 2 TB.

To simplify the implementation, we use arrays to organize pages.
The HeapFile is organized as follows:


N and P are the first 16 (or 32) bytes of the file. That is, N and P actually save two long values. N indicates the number of pages in the file, and P indicates the size of each page. Then:

  • Total File SizeFileSize = N * P + 2 * sizoeof(long).
  • Top address on any pagePage(k) = P * ( k - 1 ) +2 * sizeof(long) (k = 1,2,...,N)
Page Layout

The page can contain multiple records. If the length of each record is the same every day, it is called a fixed-length record. If the length of each record is different, it is called a variable-length record. The fixed length record can be recorded in an array, but the variable length record cannot. Therefore, offset is used for recording. The page layout is as follows:

A record starts from the top. At the end of the page, an int integer is used to record the offset of the remaining space, and an Int integer is used to record the number of stored records on the page. The offset of each record on the page and whether it is deleted.
Where,

  • FreeSpace indicates the first address of the remaining space on the page. It is also the end address of the last record + 1;
  • N indicates the number of existing records on the page, including the records marked as deleted;
  • R1, R2,... at the end of the record indicates the offset address of the corresponding record in the page, and a bit indicates whether the record is deleted. If you want to support cross-page storage of records, you also need to separate 2 bits to identify whether the records are cross-page records.
    The R1 and R2 at the end can be defined as the following struct:
    Struct IndexRecord {unsigned int pos: 29;// The offset address recorded on the pageUnsigned int isdelete: 1;// Whether the tag is deletedUnsigned int spanned: 2;// Whether cross-page storage is enabled};
    IndexRecord is a total of 32bit, where 29bit indicates the intra-page offset address of the record; 1bit indicates whether the record is deleted; 2bit indicates whether the record is stored across pages; 0x00 indicates not cross pages; 0x01 indicates cross pages, the record is the starting part. 0x10 indicates the cross page, the record is the middle part, and the middle part can have multiple records. 0x11 indicates the cross page, and the record is the ending part.
    Then:
  • The first IndexRecord address of any record isR(k) = P-(2+k)*sizeof(int); (k=1,2,..,N)
  • Calculate the length of a page that can containFreeLength = P-(2+N)*sizeof(int)
  • The condition for determining whether a page is full isFreeLength > 0

The size of a Page is usually 2 K, 4 K, 8 K, 16 K, etc.

We need to raise the gap here, and use the TAG method directly when deleting the record. However, when updating the record, it is a variable-length record. There are three situations:

  1. The new record is as long as the original record: the original record can be updated.
  2. The new record is longer than the original record: the original record is marked to be deleted and a new record is added. If there is an index, the index file is updated.
  3. The new record changes to the original record short: the original record is updated without the need to update the index file, but there is a gap in the record.

When the space is insufficient, you can try to compress the page and remove the gaps.

Record Layout

The layout of fixed-length records can be relatively simple. This section mainly discusses the layout of variable-length records, also known as serialization of records.

A common example is the definition of a given table Person, so that the name can be no more than 1024 characters. The Schema is as follows:

CREATE TABLE Person (    name      VARCHAR(1024) NOT NULL,    age       INTEGER NOT NULL,    birthdate DATETIME)

The reason for increasing the number of records in the Table above is:

  1. The name field is a variable-length string;
  2. Birthdate can be NULL;

The key to serialization of variable-length records is the definition of field boundaries. A popular method is to save the offset of the field boundary in the record header.
The record of a Person is organized as follows:


Note: we set four integers in the header to store the Four boundary offsets of three fields.
The preceding orchestration method naturally provides a NULL field orchestration method-you can identify the value of this field as NULL, for example:


The third offset and the fourth offset point to the same position, which indicates that the size of the third field is zero, which is a NULL value.

As you can see, offset is very convenient for both Page layout and record serialization.

According to the above introduction, we can infer the following:

  • Total record lengthRecordLength = R [k] k is the number of fields
  • The length of each field isColnumLength(k) = R[k] - R[k-1] , (k=1,2,3,...)
  • Determines whether a field is NULL.ColnumLength[k] = 0 ,(k=1,2,3,...)

Finally, let's take a look at the overall layout of the HeapFile file with a relationship with Person.


Reference

Here is an article about the storage layout of HeapFile translation relational data on disks.
From http://dblab.cs.toronto.edu/courses/443/tas/

Welcome to my website-Butterfly's blog garden-a column with no name.
If you have any questions during reading this article, please contact the author. For more information, see the source!



What is the principle of storing data on a disk?

The storage of files on disks is like a linked list. The header is the starting address of the file. The entire file is not necessarily continuous, but is connected by a node or node. To access a file, you only need to find the header. When deleting a file, the header is actually deleted, and the subsequent data is not deleted until the next disk write operation occupies the node location, will overwrite the corresponding data. This is exactly what data recovery software uses. Therefore, even if you accidentally delete the file and perform other disk write operations, the data can be restored as long as the data is not overwritten.

The reason why a file can be recovered must begin with the data structure of the file on the hard disk and the storage principle of the file. The new hard disk needs to be partitioned and formatted before it can be installed and used by the system. Generally, the hard disk should be divided into the Main Boot Sector, the Operating System Boot Sector, the file allocation table (FAT), the directory area (DIR) and the Data area (Data.
In File Deletion and recovery, the "directory" of "File Allocation Table" plays an important role. For security reasons, the system usually stores two identical FAT copies; the information in the directory area locates the specific storage location of the file data on the disk. It records the Starting Unit (which is the most important), file attributes, and file size of the file.
When locating a file, the operating system will know the specific location and size of the file in the disk based on the starting unit recorded in the directory area and the file allocation table area.
In fact, although the data area of the hard disk file occupies most of the space, it does not actually make any sense if there is no previous part.

What people usually do is to let the system modify the first two codes in the file allocation table (equivalent to the "deleted" mark), and reset the records of the cluster number occupied by the file in the file allocation table, to release the space occupied by the file. Therefore, the remaining space on the hard disk is increased after the file is deleted, and the actual content of the file is still stored in the data zone. It must be overwritten only when new data is written, the original data will not disappear before overwriting. Recovery Tools (such as FinalData) use this feature to restore deleted files.
The principle of Hard Disk partitioning and formatting is similar to that of file deletion. The former only changes the partition table information, and the latter only modifies the file allocation table, but does not actually Delete the data from the data area, therefore, there will be a variety of Hard Disk Data Recovery Tools.
So how can we make deleted files unrecoverable? Many friends say that after deleting a file and writing new data again, the original file may not be retrieved after repeated attempts. However, the operation is troublesome and not safe enough.
Therefore, it is best to use some professional deletion tools to process the data, and automatically rewrite the data N times to make the original data look completely different.
 
What is the relationship between file management and databases? The data on the disk is stored as a database, or do you need to create a new database and then add the data?

Answer your question below:
1. What are the differences between file management and database management:
So-called file management, it is a group of software that implements Unified File Management in the operating system, managed files, and some data structures required for implementing File Management (it is the general term for accessing and managing file information in the operating system ).
From the system perspective, the file system organizes, allocates, and recycles the storage space of the file storage, and is responsible for file storage, retrieval, sharing, and protection.
From the user's point of view, the file system mainly implements "name-based storage". Users of the file system can access the information in the file as long as they know the file name of the desired file, you don't need to know where these files are stored.
As a unified information management mechanism, the file system should have the following functions:
① Uniformly manage the file storage space (that is, external storage) and allocate and recycle the storage space.
② Determine the storage location and format of the file information.
③ Realize the ing between a file name space and an external address space, that is, the file is accessed by name.
④ Effectively implement various file control operations (such as file creation, cancellation, opening and closing) and access operations (such as reading, writing, modification, copying, and dumping ).

2. Database Management System (DBMS) refers to the software configured for database creation, use, and maintenance. It provides functions, including defining tables, adding, modifying, and, you can delete data and query data flexibly. these functions can be called in advanced languages. using Advanced languages and development tools, and calling the functions provided by the database management system, we can compile programs to manage a large amount of non-numerical data in our daily work.
The layers between them should be hardware, operating systems, dbms (or compile programs, diagnostic programs, and other system software), and application software.
 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.