Linux hard disk IO optimization related data collation

Source: Internet
Author: User

Kernel-related parameters

Relevant kernel parameters, if conditions can be modified parameter test validation.

1,/proc/sys/vm/dirty_ratio

This parameter specifies how much of the file system caches dirty pages (such as 10%), and the system has to start processing the cache dirty pages (because the number of dirty pages is already more, in order to avoid data loss need to brush a certain dirty page into external memory) In this process, many application processes may be blocked by the system switching to file IO.

Increasing the parameters will use more system memory for disk write buffering, and can greatly improve the write performance of the system. However, when continuous, constant write situations are required, the values should be lowered:

echo ' 1 ' >/proc/sys/vm/dirty_ratio

For systems with high data consistency requirements and low long-term system IO pressure, it is recommended to lower this value, which can be understood as similar to Direct_io.

2,/proc/sys/vm/dirty_background_ratio

This parameter specifies how much of the file system cache dirty pages (such as 5%) will trigger the background writeback process, such as Pdflush/flush/kdmflush, to run, and to asynchronously brush a cached dirty page into external memory.

Increasing will use more system memory for disk write buffering, and can greatly improve the write performance of the system. However, when continuous, constant write situations are required, the values should be lowered:

echo ' 1 ' >/proc/sys/vm/dirty_background_ratio

Dirty_ratio vs Dirty_background_ratio usually the system first reaches the dirty_background_ratio condition and then triggers the flush process to perform an asynchronous write-back operation, but in this process the application process can still write, If more than one application process writes more than the flush process brushes, the value set by the Dirty_ratio parameter is reached, and the operating system is transferred to the process of processing dirty pages synchronously, blocking the application process.

3,/proc/sys/vm/dirty_writeback_centisecs

This parameter controls the run interval of the kernel's dirty data refresh process Pdflush. The unit is 1/100 seconds. The default value is 500, which is 5 seconds. If your system is continuously writing to the action, then actually it is better to lower this value, so that the spike write operation can be flattened into multiple writes. The Setup method is as follows:

echo "/proc/sys/vm/dirty_writeback_centisecs" >

This value should be increased if your system is short-term, spike-type write operations, and the amount of data written is small (dozens of m/times) and the memory is more affluent:

echo "+" >/proc/sys/vm/dirty_writeback_centisecs

4,/proc/sys/vm/dirty_expire_centisecs

After the parameter declares that the data in the Linux kernel write buffer is "old", the Pdflush process begins to consider writing to disk. The unit is 1/100 seconds. The default is 30000, which means that 30 seconds of data is old and will flush the disk. For specially overloaded writes, it is good to shrink the value appropriately, but it does not shrink too much, because too much narrowing can cause the IO commit to be too fast.

echo "/proc/sys/vm/dirty_expire_centisecs" >

Of course, if your system memory is large, and the write mode is intermittent, and the data written every time is small (such as dozens of M), then this value is better.

5,/proc/sys/vm/vfs_cache_pressure

Indicates that the kernel recycles the memory used by the directory and Inode caches, and the default value of 100 means that the kernel will keep the directory and inode caches at a reasonable percentage based on Pagecache and Swapcache, and lower the value below 100. Will cause the kernel to tend to retain the directory and Inode caches, and increasing this value by more than 100 will cause the kernel to tend to recycle directory and Inode cache defaults: 100

6,/proc/sys/vm/min_free_kbytes

This file represents the minimum amount of free memory (Kbytes) that the Linux VM is forced to keep.

Default setting: 724 (512M physical memory)

7,/proc/sys/vm/nr_pdflush_threads

This file represents the number of Pdflush processes currently running, and the kernel will automatically add more Pdflush processes with high I/O load. Default setting: 2 (Read only)

8,/proc/sys/vm/swapiness

The file represents the degree to which the system is exchanging behavior, and the higher the value (0-100), the more likely the disk exchange will occur. Control the swappness parameter, minimizing the application's memory being swapped into the swap partition, which defaults to 60.

I/O performance analysis tools:

Vmstat

Iostat

I/O performance tuning tool: Sysctl

I/O Scheduler

A) I/O scheduling of 4 kinds of algorithms

1) CFQ (Completely Fair Queuing, complete fair line)

In the latest kernel version and release, CFQ is chosen as the default I/O scheduler, which is also the best choice for a common server. CFQ attempts to evenly distribute access to I/O bandwidth, avoiding starvation of processes and achieving lower latencies, which is a tradeoff between the deadline and as schedulers. CFQ is the best choice for multimedia applications (Video,audio) and desktop systems.

2) NOOP (Elevator-type dispatch program)

NoOp implements a FIFO queue that organizes I/O requests like the work of the elevator, and when a new request arrives, it merges the request into the most recent request to ensure that the same media is requested. NoOp is the best choice for flash memory devices, RAM, and embedded systems.

3) Deadline (Deadline scheduler)

Deadline ensures that a service request is made within a cutoff time, which is adjustable and the default read period is shorter than the write period. This prevents the write operation from starving to death because it cannot be read. Deadline is the best choice for the database environment (ORACLE rac,mysql, etc.).

4) as (expected I/O scheduler)

Essentially the same as deadline, but after the last read operation, wait for 6ms to continue scheduling other I/O requests. You can subscribe to a new read request from the application to improve the execution of the read operation, but at the expense of some write operations. It inserts new I/O operations into each 6ms, while some small write streams are combined into an uppercase stream, with write latencies for maximum write throughput.

As is suitable for writing more environments, such as file servers. As has a poor performance on the database environment.

ii) View and setup of I/O scheduling method

1) View the I/O schedule for the current system

[Root@test1 tmp]# Cat/sys/block/sda/queue/scheduler

NoOp anticipatory deadline [CFQ]

2) temporarily change I/O scheduling

For example: To change to noop elevator scheduling algorithm:

echo NoOp >/sys/block/sda/queue/scheduler

3) Permanently change I/O scheduling

Modify kernel boot parameters, add elevator= Scheduler name

[Root@test1 tmp]# Vi/boot/grub/menu.lst

Change to the following:

Kernel/boot/vmlinuz-2.6.18-8.el5 ro root=label=/elevator=deadline rhgb quiet

After rebooting, review the scheduling method:

[Root@test1 ~]# Cat/sys/block/sda/queue/scheduler

NoOp anticipatory [deadline] Cfq

File System Basic Properties

1. When I I/O for this large file on the G, the performance gap between the various common file systems is small and the performance bottleneck

Often on the disk. Small file read and write bottlenecks are disk addressing (TPS), and the performance bottleneck for large file reads and writes is bandwidth.

If the file system is a large number of files, set a larger block size will be better performance.

2, three kinds of log mode:

Log (Journal)
All changes to the file system's data and metadata are recorded in the log. This mode reduces the loss of each file to be repaired by

Opportunity, but it requires a lot of additional disk access. This is the safest and slowest log mode.

Reservation (Ordered)
Only changes to the file system metadata are recorded in the log. is the default logging mode.

Write Back (Writeback)

Do not perform any form of data logging.

3, EXT4

EXT4 has mainly made two optimizations to ext3:

The first is the inode pre-allocation. This allows the inode to have good local characteristics, the same directory file Inode as far as possible to put together, speed up the directory addressing and operational performance. Therefore, the application of small files also has a good performance.

The second is the EXTENT/DELAY/MULTI data block allocation strategy. These policies allow large file blocks to be stored continuously on disk, with a significant reduction in the number of data addresses and significantly improved I/O throughput.

EXT4 is a good choice for sequential large I/O reading and writing. The EXT4 supports up to a maximum of terabytes of files (assumed to consist of 4KB blocks).

Linux hard disk IO optimization related data collation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.