(optimized from the bottom to the top)
First, the storage layer optimization:
Use RAID 1+0, do not use RAID 5/or RAID 6
RAID disks use Mind 8 disks as a set of LUNs.
Avoid running the database server disables Liunx soft LVM. (Severe impact on performance)
The array card is equipped with the cache and the Bbu module as the base support for the write cache
Note: The BBU: Is the power supply module for the array card cache, which is a battery pack consisting of multiple batteries.
Set write policy to WB, or Force WB, don't use WT strategy
1. wb = Write back:
   Submit IO write requests from OS to disk, first to the array card, The array card will first put the IO write request into the cache and then in the asynchronous write to disk from the cache.
Note: The OS thinks the IO commit is complete, but the data is not fully written, but in the cache, if the BBU is below 15, the WT mode will be switched, and the IO pressure load will be quite High. /c4>.
2, Force WB: Regardless of the amount of power, are forced to write to the cache inside.
Note: If the array card's BBU module is damaged or has no power, and the write policy is set to force WB, the OS sends an IO write request and writes to the cache. At this point, if the machine is out of power, the data in the array card cache will be lost.
3, wt = Write through: Do not write cache directly write disk.
Turn off pre-read, no need to read, the valuable cache is used to write cache (highly recommended)
Turn off the physical disk cache policy to prevent data loss
Note: When the server loses power, because the cache of the array card is powered by the BBU module, the data in the cache can be written to the disk, but the disk cache does not have a power supply mechanism and the data in the disk cache is lost once the power outage occurs.
Use a high-speed hard drive without using a low-speed disk
or use SSDs and PCIE-SSD disks
Adjust system kernel parameters based on SSD device:
By adjusting the/sys/block/sdx/queue/read_ahead_kb to control the size of the pre-read policy in the system kernel is 16KB (depending on the average IO read size, 16kb is the XP), improving overall performance through proper pre-reading.
Note: Put the pre-read block into the memory, this can effectively improve, sequential IO read efficiency, depending on the situation.
A large number of IO requests are bound to produce a large amount of interrupt requests, so it is necessary to bind different interrupt number policies on each core to avoid excessive CPU load on the single song Core.
Interrupts: When the kernel is dealing with IO, it is driven by a connection, and therefore interrupts when interacting with the driver (because the CPU and memory are faster than the disk)
======================================================
Second, the BIOS optimization:
System profile Selects performance Per Watt Optimized (DAPC) for maximum power performance, not energy saving mode (High compute node disabled).
Note: in the energy-saving mode, the server low-frequency performance to high-frequency performance, prone to bugs, resulting in server power consumption is not up.
Memory Frequency (RAM frequency) Select Maximum performance (best performance)
C1E, which allows you to enable or disable the processor switching to the lowest performance state when the processor is idle, it is recommended to turn off (enabled by default)
C States (c State), allows you to enable or disable the processor to operate in all available power states, recommended off (enabled by default)
CPU priority selection of high frequency (computing power), followed by the choice of more than the number of cores (can be multi-threaded concurrent processing, using multi-instance).
Note: since MySQL is a single process, multi-threaded working mode, so more is dependent on the high frequency CPU, if it is multi-instance, can choose the core to improve the computational performance
If there are hard disks larger than 2T, select UEFI (newer BIOS) and do not use the old BIOS.
Use more memory to eliminate the impact of IO bottlenecks (more memory is required when multiple instances).
Note: the disk does not work as fast as the memory or CPU, even if it is SSD or PCIE-SSD. So more memory is needed to support it, easing the excess iops.
======================================================
third, the operating system of excellent : ( /proc/sys/vm/)
Vm.swappiness
reduce the probability of using swap, vm.swappiness set to 0 means that as little as possible swap,100 indicates that the inactive (inactive) memory pages are exchanged as much as possible.
Attention:
RHEL 7 or less, recommended set to 0,rhel 7 or more carefully set no higher than 5 ~ 10, reduce the probability of using swap
RHEL 7 or less, if set to 0, may cause oom to occur, be cautious.
Do not use this parameter with vm.hugepages, otherwise it can easily lead to oom or other pits. Avoid! (Pending test)
vm.dirty_ratio = 5 or 10 (experience value)
Note: Specify a percentage value. The default value is 20 when the dirty data consists of this percentage value that reaches the total system memory and begins to write down the dirty data (pdflush) .
No higher than 30, must be larger than dirty_background_ratio, to avoid I/O subsystem hang
vm.dirty_background_ratio = 5 or 10 (experience value)
Note: Specify a percentage value. When the dirty data composition reaches this percentage value of the total system memory, it begins to write dirty data (pdflush) on the back end, and the default value is ten. If more than 10%, it will block all IO, directly to the brush disk operation. A large amount of Io blocking is generated at this time.
vm.overcommit_memory = 1
Comments:
Regulations decide whether to accept extra-large memory Please the conditions of the request. There are three possible values for this parameter :
0 &NBSP, -  default " " line inspired type memory over place Ming display invalid
please Beg. memory is used inspired rather than the exact algorithm count enter line deployment, this may be created
line
1 &NBSP, -  kernel Line no memory over > Daniel. Use this set strong big
" Span style= "FONT-SIZE:10.5PT;" > service performance.
2 &NBSP, -  memory rejection absolutely equals or greater than Total available SWAP  size and overcommit _ratio
of memory please request . If you want to reduce the risk of small memory usage , this is the best.
IO Scheduler (priority use of deadline, if SSD, use NoOp)
Improve Scheduler request queue:echo 4096 >/sys/block/sdx/queue/nr_requests
Note: When there are a large number of requests, the system request queue value is too high, this time need to increase the request queue value. A bit like MySQL in the Back_log or thread_cache_size parameters, is to increase the queue pool, so that some of the queue first put in, rather than let them wait outside, the queue came in to make a quick request, and so on after the request is the queue quickly processing.
When adjusting the file system: Preferred XFS, followed by Ext4
Note: XFS compared to ext3 and Ext4 in under high IO load improved IOPS performance is better. These two file systems can be quite, and may even be weak, at normal loads.
mount parameters: Noatime, Nodiratime, Nobarrier
Attention:
Noatime, Nodiratime: does not log the last access time of the file/directory (basically meaningless, useless)
nobarrier: Many file systems now force the underlying device to flush the cache when the data is submitted, avoiding data loss, called write barriers.
However, in fact, our database server underlying storage device either uses a RAID card, the RAID card itself battery can be power-down protection , or flash card, it also has a self-protection mechanism to ensure that the data is not lost. So we can safely use the Nobarrier mount file system.
This article is from the "Always on the Road" blog, please be sure to keep this source http://chenql.blog.51cto.com/8732050/1958938
Db-server Normalization configuration template One by one hardware, file system, operating system