Database Learning notes: Chapter Nineth storing data: disks and files

Source: Internet
Author: User

Nineth Chapter

The data is stored on disk as a disk block. The block is distributed on concentric annular tracks of one or more platters. Tracks can be recorded on a single or double side of the platter.

The set of all tracks of the same diameter is called a cylinder.

The size of the disk block can be set to a multiple of the sector size when the disk is initialized.

Each recorded surface has a disk head array. When reading and writing a piece, the head must be positioned in the block position.

The main reason for not reading and writing in parallel is that it is difficult to ensure that all heads are precisely positioned on the corresponding tracks.

A disk controller is an interface between a disk drive and a computer.

When data is written to a sector, the checksum needs to be computed and stored on the sector, which needs to be re-calibrated when the data on the sector is left alone.

The seek time is the time for moving the head to the track where the desired block is located, the rotation delay is the waiting time for the desired block to rotate to the head, and its average time is the time required to rotate the half circle, which is usually less than the seek time. Transmission time is the time when the disk block is actually read or written when the head is positioned, that is, when the disk rotates through the data block.

RPM =revolutions per minute How many turns per minute

The unit of data transfer between disk and main memory is a block, and if only one item on the block is needed, the entire block is transferred.

A disk array is a form of organizing several disks together to improve performance and improve the reliability of the storage system.

Redundancy is used to improve reliability, i.e., reliability is improved by increasing redundant information rather than by simply copying the saved data.

Data partitioning is used to improve performance, which distributes data across multiple disks.

A disk array that combines data partitioning and redundancy is called a redundant array of independent disks, or raid

In data partitioning, the data is divided into equal-sized segments and distributed across multiple disks. The size of the segment is called the dividing unit. Data segments are typically distributed using a loop algorithm, and if the disk array has d disks, then data segment I is written on the I mod D disk.

For a disk array of 1 bits, each unit of time the array can process the number of requests and the average response time for each individual request is similar to a single disk.

Most disk arrays store checksum information, and in the check mode, an additional check disk is used to hold the checksum information that can recover any disk failure in the array.

In a RAID system, the disk array is divided into different reliable groups, and the reliable group consists of a set of data disks and a check disk.

Disk space management supports the concept of a page as a data unit, and provides commands for assigning and reclaiming pages and reading/writing pages:

Disk space Manager hides Seagate, the underlying hardware (and operating system), and allows high-level software to make data a collection of pages.

One way to track disk utilization is to maintain a list of free blocks that, when the blocks are recycled, put them in a free list for future use. The second method is to maintain a bitmap, where each bit corresponds to a disk block.

The database disk space manager can be built on top of the OS file.

The policy used to determine which pages were replaced is called the substitution policy replacement.

The buffer Manager is the software layer responsible for fetching pages from disk to main memory when necessary, and it manages the available main memory by dividing the buffers into page sets, which are often referred to as buffer pools. The main memory page in the buffer pool is called the frame, which is the slot that holds the page.

In addition to the buffer pool itself, the Buffer Manager maintains a number of thin information and two variables that describe the frame: Pin_count and dirty. Pin_count records the number of times that the current page in a frame has been requested but not released, that is, the current number of users of the page. The Boolean variable ditry indicates whether the page has been modified since the disk was read into the buffer pool.

At the beginning, the Pin_count for each frame is 0,dirty false, and when the page is requested, the buffer does the following:

(1) Check whether the buffer pool contains the requested page, if the buffer pool has the page, the Pin_count value of the page is increased, if the buffer pool does not have the page, the Buffer Manager will read the pages in the buffer pool as follows:

(a) Select the replacement frame according to the replacement policy and increase its pin_count.

(b) If the dirty of the replacement frame is true, the page that holds the frame is written back to disk.

(c) Read the requested page into the replacement frame.

(2) Return the address of the replacement frame to the requester.

Buffer substitution Policy:

At least recently, the policy LRU (least recently used) is used, which is implemented by managing a pointer queue in a buffer that points to a frame of pin-count 0. When a frame becomes a replacement candidate (Pin-count is 0), it is added to the end of the queue, and the frame of the queue header is always selected when replacing.

Clock: A variant of LRU that uses the current variable to select a replacement frame in a ring order, in order to approximate the LRU behavior, each frame also has an associated referenced bit, which starts when the page's pin-count becomes 0 o'clock.

When replacing, select the frame that current points to, and if the frame is not selected to be replaced, current will increase, while the next frame is considered. If current points to a frame whose pin-count is greater than 0, it cannot be the replacement candidate key, and current increments the count. If the referenced bit of the frame that current points to has been started, the clock algorithm will turn it off and increase it, which makes it impossible for a recently referenced page to be substituted. If the current frame has a pin-count of 0 and its referenced bit is closed, the pages in it will be replaced.

9.5 Record Files

Linked list of pages: One method is to maintain a heap file as a doubly linked list of pages. The DBMS records the location of the first page of the file through a known location on the disk by the < heap file name, the home address > The table to be composed. The first page of the file is called home.

Cons: When a record is a variable-length record, all pages in the file will in fact exist on the free-space list, because each page may have at least some free bytes.

Page catalog (page directory): Another way to build a page list is to

The directory itself is a collection of pages, and a catalog page can have multiple catalog items, each of which points to a page of the heap file.

A heap of files allows us to traverse all the records: 1. Through a specific RID, 2 browse through all the records sequentially

Page format: Fixed length record variable length record

Database Learning notes: Chapter Nineth storing data: disks and files

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.