How relational databases work-fast caching (translated from Coding-geek articles)

Source: Internet
Author: User
Tags oracle documentation prefetch

This article is translated from Coding-geek article: How does a relational the database work.

Original link: http://coding-geek.com/how-databases-work/#Buffer-replacement_strategies

Translate the Quick cache section first. Perhaps there is time to translate the other chapters.

Translate the contents in the original folder:

first, Data manager


The data query runs a query operation to fetch data from the data table. It sends a request to the data manger to get it. There are 2 questions:

    1. Relational data uses a thing model. The query operation cannot be run when the database is running a churn operation. Avoid querying dirty data.

    2. Data extraction is the slowest database operation because the data is to be read from disk.

      Therefore, the database must have a very powerful data caching system.

Chapter. We'll look at how relational data solves both of these problems.

We are not going to discuss how the database loads data from disk. This is not the focus of this article (limited by space, do not expand analysis).

second, fast cache


As I said before. The performance bottleneck for a database is I/O.

To improve performance, the modern database uses a fast cache.

The data Finder obtains data from the cache manger. Instead of reading the data directly from the disk file.

Cache Manger manages a piece of memory area. called the cache pool. Getting data directly from memory makes access to the database a tremendous leap in performance.

However, it is very difficult to assess how important it is to use the fast cache, depending on what kind of database operation you are doing.

    • Sequential interview VS random access.

    • Read operations VS write operations.

And what kind of disk the database is using.

    • 7.2k/10k/15k rpm HDD
    • Ssds
    • RAID 1/5/...

However, I dare say that using memory fast caching is 100 to 100,000 times times faster than the non-applicable cache to read data directly from disk.
This also leads to another problem (all databases have this problem ...), the fast cache needs to pre-fetch the data before the Finder visits the data, otherwise the query needs to be suspended, waiting for the fast cache to load the data from disk into memory first.

third, cache data pre-fetching

The core of the problem is "data prefetching".

The data finder is clear about what data is needed, as it understands the detailed requirements of each query operation and the storage structure of the database tables. The basic logic of data prefetching is this:

    1. The data Finder notifies the cache manger to load the second batch of data into the buffer in advance when it obtains the first batch of data.
    2. The data Finder notifies the cache manger to load the third batch of data in advance when it obtains the second batch of data, and the first batch of data can be removed from the cache.

    3. .......

The cache Manger stores all the data in the buffer pool.

To determine whether the data in the cache pool is being used, cache manger needs to maintain some additional information about the data (something called a lock).

But sometimes. The data Finder does not know what data is needed for the next step, or the database does not provide functionality to specify which data to prefetch. Instead, the database provides a random prefetch function (for example, after querying the data, it is possible that you may need to 7,8,9 the 7,8,9 into the cache in advance) or the sequential cache function (after running a query. Other data that is adjacent to the query data on the disk is also pre-provisioned into the cache.

In order to evaluate the effect of cache manger expected mechanism work. Modern database systems provide a metric metric: Cache Hit ratio. The cache hit rate describes the probability that the finder is getting data from the cache (without having to read the disk file).

Description: A bad cache hit rate. Does not always mean that the cache is not working well. A number of additional information is available in the Oracle documentation.

However, the fast cache memory size is limited. Cached content needs to be constantly rehan transpire. The loading and removal of cached data requires disk I/O and network I/O resources to be consumed.

Assuming that a query operation is often run, it is inefficient to load and remove the cache data frequently. In order to solve the problem. Modern databases use a number of cache substitution strategies.

Iv. Cache substitution Strategy

Most modern database cache substitution strategies Use the LRU algorithm, at least SQL Server, MySQL, Oracle and DB2.

1. LRU

LRU means that it is not currently used in the near future. The algorithm is based on the assumption that recently used data, the probability of being reused in the future is very large and needs to reside in the cache. Conversely, data that is not currently used in the near future can be removed.


For ease of understanding, we assume that the data in the cache is not locking (and therefore can be removed).

Give a sample to illustrate how it works. In this simple demo sample, the cache pool can hold 3 of data.

    1. Cache manger after using data 1. Put 1 into the cache.

    2. After the cache manger uses data 4, it puts 4 into the cache.
    3. Cache Manger after using Data 3. Put 3 into the cache.

    4. Cache Manger after using data 9. Put 9 into the cache.

      Because the cache is full, you need to remove a piece of data first. Which one is removed?
      According to the LRU principle, 1 is the furthest current data to be used, removing 1 and increasing by 9.

    5. Cache Manger is cached after using Data 4, and 4 becomes the data that has been used recently. Adjust the order.

    6. Cache Manger is cached after using Data 1, and 1 becomes the data that has been used recently. 3 is removed.
    7. ......

Algorithm OK. But there are some limitations, assuming that reading is a large table? Other words. The table data read is too large to exceed the size of the cache space. Using this algorithm clears all the data before the cache, even if the new loaded large table data is used only once and is no longer used.

2. Algorithm improvements

To solve the problem. Some database management systems add some special rules. For example: Oracle Rule description:

For super Large table reads, the data is read directly from the disk file. Avoid the use of fast caching. for medium tables. The ability to read directly from a disk file is also possible with caching. Suppose the cache should be used to put the read data at the end of the LRU list (so that the new cache data will be removed before the data is added to the table).

The LRU algorithm has an advanced version number called Lru-k. For example, SQL Server uses the lru-k, k=2.

K represents the number of data visits that are considered in recent time periods.
The previous sample is the simplest example of the lru-k algorithm. Consider only one visit. K = 1. The principles of lru-k such as the following:

    1. Number of recent visits to record data (up to a maximum of K records).
    2. Set a weight based on the number of data visits. The more recent visits, the greater the weight.
    3. When a new batch of data is loaded into the cache, data with large weights is not removed, even if the data is loaded into the cache very early.

    4. Assuming that the data has not been reused for a long time, the weights are gradually reduced.

The calculation of weights is very resource-consuming. This is why SQL Server uses k=2. This way of setting up. High input-output ratio.

For more in-depth understanding of the LRU algorithm, you can take a look at the algorithm document (document Google).

3. Other algorithms

Other algorithmic policies for managing the fast cache.

    • 2Q (similar lru-k algorithm)
    • CLOCK (similar lru-k algorithm)
    • MRU (a much more used algorithm.) Logically similar to LRU. Use a different set of rules)
    • LRFU (recent, most frequently used algorithms)
    • ......

Some databases agree that you use algorithms other than the default algorithm. A variety of options are available.

Five, write cache

The most discussed before is the read buffer. It loads the data into memory before it is used. There is also a write cache in the database, which accumulates data stores that have been manipulated multiple times and writes to disk files at once. Reduce frequent access to disk IO (database bottlenecks in I/O).

Keep in mind that paging data is stored in the fast cache rather than the line data in people's intuitive impressions. Suppose that a page of data in the cache has been altered and not saved to disk, which is called a "dirty page." There are several strategy algorithms that can evaluate the best time to write dirty page data to disk, which is also strongly related to things (the transaction is what will be expanded in the next section).

Translated "How does a relational" other chapters link:
1. How relational databases work-time complexity: http://blog.csdn.net/ylforever/article/details/51205332
2. How relational databases work-merge sort: http://blog.csdn.net/ylforever/article/details/51216916
3. Relational database working principle-data structure: http://blog.csdn.net/ylforever/article/details/51278954
4. How relational databases work-Fast cache: http://blog.csdn.net/ylforever/article/details/50990121
5. How relational databases work-transaction management (i): http://blog.csdn.net/ylforever/article/details/51048945
6. How relational databases work-transaction management (b): http://blog.csdn.net/ylforever/article/details/51082294

How relational databases work-fast caching (translated from Coding-geek articles)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.