Many DBAs often encounter database I/O bottlenecks during database management. When the hardware cost is limited, there are two solutions to this bottleneck: Increase the memory; the second is to add Disks without adding machines ). Whether it is cost-effective to increase the memory or increase the disk cost is a headache.
Jim Gray, a mysterious database master, had considered this problem for us 20 years ago and came to the conclusion that "Five-minute Rule" Five Minutes Rule ), that is to say, if a page is accessed every five minutes, it should be stored in the memory; otherwise, it should be stored on the disk. In this way, the database only needs to count how many pages are accessed more than five minutes, and then know How much memory is needed.
Of course, five minutes is a typical value, or an order of magnitude. The specific value depends on the hardware conditions. Jim Gray gives the following formula:
RI = number of data pages per M * price of a disk/random I/O times per disk per second/price of memory per M
RI indicates the page access time period threshold to be put into memory.
This formula is understandable. Assume that you get a page and its access cycle is I. You have to decide whether to buy memory to store this page or buy a disk to store it. If you buy memory for storage, the overhead is easily calculated as "Memory price per M/data page per M ". If you buy a disk to store it, the overhead is "the price of a disk/the random I/O times that the disk can perform per second", which means, I spent so much money on buying a disk as "the price of a disk", and I will only access "1/I" times on this page, therefore, I only use "1/I/random I/O times per second on the disk bandwidth ", therefore, the overhead for this page is "the price of a disk/the random I/O times that the disk can perform per second ".
In this way, memory overhead "Memory price per M/data page per M" and disk overhead "disk price/random I/O times per disk per second/I" is equal access time period threshold, the calculated result is the RI in the formula.
When the database IO bottleneck occurs, calculate the commonly used hardware. Set the page size to 16 kinnodb default page size), that is, 64 pages per M of data, a piece of SAS calculates 3000 pieces, 200 random IO per second, 4G memory is counted as 3000 pieces, that is, the memory is 0.75 yuan per M. In this case, the RI is 1280 seconds, that is, about 20 minutes.
Unfortunately, MySQL does not provide the page Access frequency statistics function. You can use a method to replace it by turning off the database and checking whether the database cache is full within 20 minutes. If it is not full, the memory is too large. Otherwise, the memory is too small. This experiment will not be conducted during the hot time period, but the database can be restarted in non-hot time periods, but the load is light and inaccurate. Even if MySQL crash is used one day, you have to do a lot of redo operations after the crash in the popular time periods.
In fact, this function can be implemented. Statistics on the number of pages accessed every 20 minutes can be converted to statistics on the number of different pages accessed within 20 minutes. Assuming that the memory size in the system is not much different from the ideal value, you only need to use less than 1/1000 of the memory for statistics. Of course, you need to search for another hash table each time you access the page. Well, prepare for implementation in NTSE, haha.
Of course, this computation has ignored many practical factors. For example, if the RAID card on your machine is full and cannot be attached to the hard disk, you have to add memory. If the memory slot is full, then you have to add the hard disk. If neither hard disk nor memory can be added, add the machine. In addition, the above calculation assumes that page access is completely random. If it is sequential access, it will be much different, and the RI will be much smaller at this time, because the number of pages that the disk can read per second increases.
- Key parameters for MySQL database performance optimization
- Effect of disk sorting on Oracle database performance
- Database optimization greatly improves Oracle Performance
- Database machine era is approaching new challenges for DBAs
- Ten experience in SQL Performance Optimization