Wu fengguang: Pre-read algorithm makes Linux easier to use

Source: Internet
Author: User
Wu fengguang: Pre-read algorithm makes Linux easier to use-Linux general technology-Linux programming and kernel information. The following is a detailed description. Hello everyone, I will briefly review in the report. In the past one or two years, I/O performance has been affected by the improvement of the pre-read algorithm.

As we all know, disks are very inefficient at tracing, and the tracing overhead is very large. Therefore, we need to minimize the small I/O workload. Generally, applications perform small I/O operations, it performs a small buffer and then reads such as 4 K, 4 K, and 8 K, and then optimizes the kernel to convert the small one into a large pre-read.

This pre-read can be seen in the figure below. The above is the 4 K read by the application, the following is the 16 K size of the kernel, or a larger pre-read. This pre-read mainly improves the performance. One is to improve the throughput by converting a small read to a large pre-read.

It converts synchronous read to asynchronous pre-read to implement IO wait. The basic principle of this pre-read algorithm is to check the read request sequence of the application. If the application performs sequential read, it can be pre-read. However, in actual implementation, there will be many IO access modes, which are not only Simple Sequential reads, but may also have other forms of change, so as to detect the form of change. Also, perform some operations on jitter.
  
In the next PPT, I will mainly introduce two pre-read algorithms. First of all, these improvements start with the detection of sequential reads. The simplest sequential reads are one page and one page forward reads. Therefore, their judgment conditions are very simple. Some people later found that some pages will be read repeatedly in some cases. In this case, when the reading request and the boundary of the page are not aligned, the same page will be read multiple times, this is actually a sequential read, so we can improve the judgment condition and add a condition to cope with it. Subsequent repeated reads are more complex. This happens in many network applications, such as FTP and HTTP applications. There is also the kernel AIO, in which they will often submit a relatively large read request, this request will be reversed after only a part is completed, in the old kernel, the page requested by the application is used as the pre-read judgment condition, so this will be confused and improved in the new 2.6.23, which is based on the page actually read, as the input of the pre-read algorithm, the following figure shows a very good read sequence.

Through this improvement, some users will reflect some very good performance improvements. Like a 16-level memory server, it uses HTTP to serve 1200 clients. In the old and new kernels, cpu I/O and I/O are reduced by 17%, the bandwidth of the network means that the actual service volume has increased by 17%. In addition, for disks, the disk utilization has been reduced by 26%, and the disk bandwidth has increased by 29%.

Next is another HTTP user report, which says IO viter is reduced from 80% to 20%. The following problem is pre-read jitter. This occurs on a pre-read page, which is replaced by the cache before it is actually used by the reader. Avoid saying that there are three times, A reader reads a page and a page, and then the pre-read jitter occurs. After the pre-read jitter occurs, all the pages are taken out from the buffer, in the old kernel, I/O will be performed on a page and a page. The red color indicates that disk I/O has occurred, which is very inefficient, in the new version, the new window will be re-created, and an IO will be 4 K. This will increase sequentially and the efficiency will be restored immediately. This figure shows the performance comparison after the pre-read jitter occurs.

Our computer uses 128 MB of memory and opens a new reader every second. The reader's reading speed is 20 or 30 KB per second, and gradually reaches about seconds, this happens. At this time, the network traffic in the old kernel is 5 MB per second, and the new traffic is 15 MB per second, which improves the performance by three times, IO performance is also improved by 8 times.

This figure shows another unapparent sequential read. Due to the restrictions on the file structure in Linux, only one file and one stream can be processed. It has two processes, when two file descriptors are opened for reading, this can be correctly detected as smooth reading, but the entire file is shared by two streams, which will interfere with each other. As shown in the following figure, many changes have taken place in the kernel. In this case, the pre-read will be disabled, which will cause serious performance degradation. This is an internal file structure, each of which corresponds to an open FD.

In this case, there is only one old algorithm, and they will flow to two different streams, so there is no correlation between them, and the sequence will not be detected, the improved method is to take advantage of a feature, that is, once a page is read into the memory, it will be cached for a period of time, so we can check the previous page, whether the page is in the cache. If yes, it is a sequential read. We know that the pre-read will be performed after the sequential read, and then we need to solve the problem of the pre-read size. The pre-read size should be safe. When the pre-read size is too large, the pre-read jitter may occur. Therefore, some formulas are used for estimation. This estimation is accurate, the premise is that the number of streams remains stable. After solving the first two major problems, you can get the pre-read algorithm.

Let's take a look at the figure below. First, we start with a series of waits, which indicates that the reader has read the page. This well number indicates that the reader is reading the page, and the underline above indicates the pre-read window, in the first step, we first determine whether there is a page in this place. If this page is cached, it indicates that this is an ordered stream, and we can perform pre-read operations. For pre-reading, we need to know where to start and where to end. Accept the historical page, determine the number of historical pages, and obtain an H. This H will reverse the image and get the END mark in step 4, with the start and end standards, we can pre-read them.

The following is a comparison of the three pre-read algorithms. In the old kernel, only one FD can be performed for one sequential read. A file can support almost 32 streams. These 32 streams can be increased, but the efficiency will be lower. The context-based pre-read is implemented by region, so the efficiency is not affected by the number of streams. Therefore, the number of supported streams is infinite. This feature is very suitable for prereading when the order and random read are mixed together. In this case, each random read is equivalent to a new one, so there are many in this figure, in this case, it cannot be handled. Because only 32 default values are processed. This context pre-read can also be applied to scientific computing. Scientific Computing often Splits a large matrix. The interval is equal, but the read size cannot be improved. If the I/O size cannot be improved, this performance will still be affected. According to the small text, there are a lot of pre-read streams. These streams are not aware of the first fission force, but they exist. First, four pages are pre-read, and then eight pages are pre-read. This efficiency is put on the list.

Next, the FNS server reads data. This client generally performs a relatively large pre-read operation, but this pre-read operation is split into small requests, the time for this request to arrive at the server may be messy. This server may be an SMP server with multiple CPUs. This server runs many FNSD servers, which actually receives a request, this will increase confusion, which is equivalent to executing read requests in disorder or concurrently. In this case, the new pre-read Algorithm in 6.2.23 is more insensitive to this chaotic read and has better adaptability. Therefore, the NFS read performance is improved by 1.8 times, it would be better to use context pre-read twice.

The next step is sparse read. sparse read is a part of the file, which may be 1/2 read by the application. This can change the sequence Detection conditions and support sparse read.

This is a user server. It is characterized by an 8 K read, an 8 K hop, and a backup. This cannot be detected by the old kernel, so there is no pre-read, in context pre-read, the performance will be good, and the performance will be improved by 40 to 50 times.

In the end, it is generally considered that random reading is not commonly used, but there will be very popular areas in real life. These areas are randomly read for a large number of times, that is, they are very intensive, in this way, the pre-read life rate is very high. In this case, pre-read can be performed. The pre-read Algorithm Based on context and sparse can be used here. This is another user test. In the load, the user randomly loads a large file into the memory. From the two curves, we can see that when the current area is relatively sparse read, the performance is often flat, with no major variation or obvious improvement. However, when the read density increases, the performance will be improved by three times.

This intensive algorithm can be applied in other databases, such as playing a track database, and so on.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.