RAID on "Linux" system

Source: Internet
Author: User

I am engaged in DBA-related work, recently encountered the IO jitter with shread running jitter situation, host downtime restart backup library and downstream analysis Binlog binlog damage cases, to some experienced colleagues consulting learning, the biggest suspect is: Raid card problem, Take you into the world of RAID cards today

    1. What is a RAID card and why it uses a RAID card
    2. The relationship between the cache of a RAID card and the disk's own cache
    3. Things to consider when using a RAID card

You know MySQL, you will not be unfamiliar with the InnoDB transaction log, InnoDB use the log to reduce the cost of committing transactions, because transactions are recorded in the log, it is not necessary to each transaction commits the buffer pool dirty block flush (flush) to disk, and the transaction modified data is usually mapped to a random location in the tablespace, so refreshing these changes requires a lot of random i/o,innodb to turn random I/O into sequential I/O with logs. Once the log is securely written to disk, the transaction is persisted, even if the change is not written to the data file. If something bad happens (such as a power outage), InnoDB can replay the log and restore the committed transaction.

The log buffer must be flushed to the persisted storage to ensure that the committed transaction is fully persisted, and here we have to mention a variable that controls the frequency of log refreshes: Innodb_flush_log_at_trx_commit, whose configuration has a significant impact on MySQL performance.

SET @ @global. innodb_flush_log_at_trx_commit=xxx;

1. When the value is 0 o'clock, log buffer writes every second to the log file and brushes (flush) to disk. However, each transaction commit has no effect, that is, the write operation of log buffer has no relation to the transaction commit operation. In this case, MySQL performs best, but if the mysqld process crashes, it usually causes the [last 1s] log to be lost. 2. When the value is 1 o'clock, each time a transaction commits, log buffer is written to the log file and written to disk. This is also the default value. This is the safest configuration, but it is also the slowest because disk I/O is required for each transaction. 3. When the value is 2 o'clock, each transaction commit is written to the log file, but it is not immediately written to disk, and the log file is brushed once per second to disk. At this point, if the mysqld process crashes, the data is not lost because the log has been written to the system cache, and in the case of an operating system crash, it usually results in the last 1s of log loss.

"The last 1s" is not absolute, sometimes it loses more data, "said the above. Sometimes due to scheduling problems, brushing per second (Once-per-second flushing) does not guarantee 100% execution. For some applications where data consistency and integrity requirements are low, configuring 2 is sufficient and can be set to 0 for maximum performance. Some applications, such as payment services, require a high level of consistency and integrity, so even the slowest, preferably set to 1.

All of the above illustrates the redo log-related strategy, which is unique to InnoDB, and the same important parameter is the frequency with which the Sync_binlog control MySQL binary logs (binary log) synchronizes to disk.

SET @ @global. sync_binlog=xxx;

1. If autocommit is turned on, each statement is written once by binary log, otherwise each transaction is written once. 2. The default value is 0, does not actively synchronize, and relies on the operating system itself does not periodically flush the contents of the file to disk. 3. When set to N, MySQL server brushes writes to disk after binary log has been written n times. Set to 1 The safest, synchronize a binary log once per statement or transaction, even if it crashes, it loses at most one statement or transaction log, but is therefore also the slowest. In most cases, the consistency of the data is not strictly required, so the Sync_binlog is not configured to 1. In pursuit of high concurrency, performance can be set to 100 or directly with 0. As with Innodb_flush_log_at_trx_commit, for applications such as payment services, it is more recommended Sync_binlog = 1.

It seems a bit tricky to talk about this, but I have to explain it to you.

It is important to understand the difference between "write log buffers to log files" and "flush logs to persistent storage", and in most operating systems, writing buffers to logs simply transfers the data from the InnoDB memory buffer to the buffer of the operating system, in memory, without real persistence, In contrast, flushing support to persistent storage means that InnoDB requests the operating system to swipe the data out of the cache and confirm that it is actually written to disk. This is a blocking I/O call, know that the data is fully written to complete, because the write data to disk is quite slow operation, as a database, especially high concurrency, performance requirements are relatively high, must be unacceptable, in a different perspective, for the real-time writing is not as high as the real-time reading, as long as the guarantee to write in.

The best configuration for high-performance transactional processing is to set the Innodb_flush_log_at_trx_commit to 1 and put the log file in a RAID volume with a battery-protected write cache, and finally see what we are going to talk about today!!! This takes into account safety and speed.

We saw a RAID volume in front of the persistent device disk, and the log file writes to raid by default is the disk, which includes redo log and binary log once the raid is problematic, I/O performance is impacted and the data is potentially lost.

And then we're officially on the move.

What is a RAID card and why it uses a RAID card

To simplify the problem, you can think of RAID as disk cache, which has two main functions: read-ahead and write-back

Increasing the cache size provides more readable records from the cache to the system for pre-reading. For writeback, allow the control card to save more records for later write disk. In particular, the elevator write-back, so that the continuous write-back between the section has a closer interval, reducing the average time to write operations and improve throughput rate.

Write Strategy 1. Write all data directly to the hard disk before returning to the computer in the command completion state. 2. Write-back can greatly improve performance. But the write-back has some data risk. In the event of a sudden power outage, data stored in the cache that has not yet been written to the hard drive is lost.

RAID card when the write policy is through, the cache size has a small impact on the performance of the raid card, and the role of the cache will only be played if the write policy changes to back.

The relationship between the cache of a RAID card and the disk's own cache

Whether the RAID card has (enabled) the cache has a huge impact on "random Read and write" performance. Mid-to high-end RAID cards are cached (at a high price). So how does the cache for the RAID card and the disk's own cache be set?

The Dell server's PERC H710 RAID card has 512M cache and is battery-powered. When setting up the array (RAID5), the default option for the RAID card cache is: Read policy: Adaptive write Policy: write-back disk cache policy: Disable attribute interpretation: Read policy: General to enable, with pre-read policy, can improve "random read" performance. The cache can be hit when the same data is read for the second time. Write policy: Generally to enable "write back", the operation is the cache on the raid card. Writes to the cache when writing data, even if the write succeeds, then the RAID card controller merges multiple write IO into one write io to write to the disk once, improving the performance of "Random write". The battery can power the cache for 72 hours because of a raid cassette battery when the room is out of power. The data in the cache is not lost. In addition, the default write cache is not enabled if the cache is not connected to the battery (unless it is forcibly set to "write cache is not enabled"). Disk cache Policy: The operation is a disk-brought cache. When doing raid, it is generally forbidden to prevent the loss of data in the disk's own cache when the room is out of power. The disk does not have a battery. The RAID card controller controls whether the disk's own cache is enabled. Home Bench test Machine (no RAID card used) there are options in the Windows operating system to control whether the disk's own cache is enabled (enabled by default).

Things to consider when using a RAID card

There are many factors that affect the performance of a RAID card, including the size of the RAID card cache, the write policy, the read policy, the size of the stripe (STRIPE size), which are the key factors.

    • RAID policy
1. Turn on write back to increase write efficiency by 2. Turn on bad Bbu write back by default, if the RAID card battery is broken, the RAID card will automatically switch the write back to the write Through, this time the I/O will be slow to open the bad Bbu write back case, such as If the battery is broken, I/O is still protected, but it poses a risk of losing data. If the machine loses power at this point and the data in the Cache is not protected by the battery (BBU), the data is lost. Raid card battery charging and discharging is more common than when the machine is powered off, so turn on bad BBU Write back to ensure I/O speed. 3. Turn off read operation using cache because the RAID card cache capacity is limited, in order to ensure the use of write cache, so close read Cache4. Close the disk itself cache because the RAID card cache is used, so the disk Cache5 is turned off. Turn on Adaptive Readaheadreadahead is pre-read, and read-ahead only has a performance boost for sequential disk I/O and therefore shuts it down. Adaptive ReadAhead is an adaptive read, automatically determines whether pre-read according to the above description, the MySQL server's current Cache policy should be as follows: Adapter 0-VD 1 (target id:1): Cache policy: WriteBack, Readadaptive, Direct, Write Cache OK if Bad Bbu is separated from left to right comma, corresponding to the setting results of the 1,5,3,2 above
    • Discharge

The Riad card of the Dell server has the characteristic of the rechargeable pool, this rechargeable battery, when not used, there will also be a faint discharge phenomenon, when its power discharge to a certain extent, the RAID card controller will be a "discharge" of the battery, the remaining power off, and then a "charge". This is actually a mechanism for protecting the battery, as well as providing protection against RAID array card availability.

But the problem is in the process of discharging and charging:

By default, when the battery level of a RAID card is lower than a certain threshold, the RAID card curing program considers that the battery is not available at this time, and in order to keep the data safe, the "cache" of the raid is disabled, and the default mechanism is reasonable and nothing can be "questioned". The problem is that when the raid cache is disabled, the I/O capability of the raid is greatly reduced. For high I/O applications, this decline may be fatal, may cause system I/O blocking, RAID discharge (within one second), cache will be banned, cache policy by wb-WT conversion, will bring IO jitter, there is no solution. My guess: In order to ensure that the cache data is not lost the risk of not using the machine power. Poorly architected systems may be dragged to death by this "point of failure" (application on the charging device)!!!

    • The BBU is broken.
  • No-battery Write cache:enabled and write back OK if the bad Bbu, if the battery is broken, then I/O is still guaranteed, but the risk of losing data is brought. Even if the machine is not powered down, the data in the cache is not protected by the battery (BBU), the data is lost, the group adopts this strategy, the data loss in exchange for the risk of failure, a single data set is not recommended to open this policy, OB's multiple data write successfully solved the problem.
  • No-battery Write cache:disabled case cache will be forbidden to write, the cache strategy from wb-"WT conversion, encountered a peak business, it is very likely to reach the IO bottleneck, causing the failure.

In doing the power-down drill, not all RAID card Writethrough, there is binlog file corruption problem, also appeared a machine block corrupt. Later set to Nocachedbadbbu, when set to Nocachedbadbbu, then as long as the BBU battery is broken, or insufficient power, it will be automatically converted to WT mode, to avoid the loss of data caused by BBU damage. The initial experimental result of write through on the performance impact is that it will lead to an average increase in RT 5ms,load and Iowait also has a significant increase in load 0.44, 2.4, CPU iowait 0.1->4.

    • System crash or power down

The BBU is able to ensure that the machine data is not lost, until the machine restarts power immediately after the disk brush, generally can guarantee a few hours of data storage, which is the only role of the BBU.

Engine room of the electricity are through ups and then to the actual machine, when the engine room power outage, the existence of UPS can let the machine no sense still work, but can not guarantee more than 30 minutes, at this time the staff need to start the diesel engine power generation, this is why the engine room distance from the gas station is near the reason.

Summary & Rhetorical questions
    1. Set up the right RAID card policy to ensure data security while maximizing performance, we can really guarantee that the data is not lost? Data not lost does not mean that the data does not harm, does not mean that it is useful. In the case of a commit, when the transaction data is refreshed, the transaction needs to guarantee integrity, but when the brush is half full, the memory data is lost, the transaction is not fully persisted, and a transaction is lost. When the binary log is brushed, the data that has been transferred to the repository or downstream is not persisted in time, and the files are damaged, but at least the raid guarantees that the persisted data will not be lost.
    2. RAID card policies are not unique and can be selected according to different business scenarios, such as No-battery Write cache cache, when the performance of an important choice to open, data security more important is to choose Close.
Case sharing:

1, the phenomenon description in 2015-12-07 21:28 received thread_running 357 alarm

First, there is no special change in the TPS and QPS in db, and Lor is normal

View IO Discovery exception, high await at 21:28

View RAID logs

Basically you can determine the reason for this thread_running is: RAID card charging and discharging, causing the raid card cache invalidation, Io slow.

Reference:

Two major features of disk cache

The relationship between the cache of a RAID card and the disk's own cache

Dell Server RAID Card battery policy tuning

Summary of settings for the MySQL server RAID card cache policy

Megacli Scripts and Commands

RAID on "Linux" system

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.