The IO and database optimization problem of database

Source: Internet
Author: User

I. IO introduction

There are four types of IO: continuous read, random read, random write and continuous write, the IO size of continuous reading and writing is usually larger (128KB-1MB), the main measure of throughput, and random read and write IO size is smaller (less than 8KB), mainly measure IOPS and response time. The full table scan in the database is sequential read IO, the index access is the typical random read Io, the log file is the sequential write Io, and the data file is the random write Io.

Database system based on the traditional disk access features to design, the most important feature is that the log file using sequential logging, the log file in the database, the request must be written to disk when the transaction commits, the response time requirements are very high, so the design is sequential write way, It can effectively reduce the time spent on disk seek and reduce the delay time. Log files are written sequentially, although the physical location is continuous, but unlike traditional sequential write types, the log file IO size is small (usually less than 4K), each IO is independent (the head must be lifted up and re-sought, and the disk is rotated to the appropriate position), and the interval is very short, The database is increased in size by using log buffer (cache) and group commit (bulk commit), and the number of IO is reduced, resulting in a smaller response delay, so sequential writes of log files can be considered "random writes of sequential locations". Pay more attention to IOPS than throughput.

The data file is in the place uddate, meaning that the data file changes are written to the original location, the data file is different from the log file, and does not write the data file when the transaction commits, only when the database discovers dirty Buffer too much or need to do checkpoint action, will refresh these dirty buffer to the corresponding position, this is an asynchronous process, usually, the data file random write to the IO requirements are not particularly high, as long as the checkpoint and dirty are satisfied The request for buffer is ready.

Two. Optimization

becauseMysqldatabase is based on row(Row)stored database, and the database operationIOthe time is topage (block), that is to say, if the amount of space that each of our records consumes is reduced, it causes eachpageThe number of data rows that can be stored in theIOthe number of rows that can be accessed is also increased. Conversely, data that processes the same number of rows requires access to thepagewill be reduced, i.e.IOreduce the number of operations, directly improve performance. Also, since our memory is limited, increase eachpageThe number of rows stored in the data is equal to the amount of cache data per memory block, and also increases the chance of the memory hit in the swap, which is the cache hit rate.

1. data type selection

the most time-consuming operation in a database operation is IO processing, most of the database operation 90% more time spent on the io Read and write. Therefore , to minimize the IO Read and write volume, can greatly improve the performance of database operations.

We can't change the data that needs to be stored in the database, but we can take some thought into how that data is stored. The following tuning recommendations for field types are primarily applicable to scenarios where the number of records is large and the amount of data is larger, because fine-grained data type settings can lead to higher maintenance costs, and over-optimization can also lead to other problems:

  Numeric type : Never use DOUBLE as a last resort, it'snot just a matter of storage length, but also an issue of accuracy. Similarly, decimal is not recommended for fixed-precision decimals , and it isrecommended to convert to integer storage by multiplying the fixed multiples, which saves storage space without any additional maintenance costs. For the storage of integers, in the case of large amount of data, it is recommended to distinguish the choice of open Tinyint/int/bigint , because the storage space occupied by the three is also very different, can determine the fields that do not use negative numbers , it is recommended to add unsigned definition. Of course, if you have a database with a small amount of data, you can also not strictly distinguish between three integer types.

  character type : Not the last resort to use the TEXT data type, which determines whether his performance is less than char or varchar Type of processing. Fixed Length field, it is recommended to use CHAR type, indefinite long field to use VARCHAR as far as possible ,and only set the appropriate maximum length, rather than very random to a large maximum length limit, MySQL also has different storage handling because of the range of lengths .

  Time type : Use the TIMESTAMP type as much as possible because the storage space requires only Half of the DATETIME type. For data types that only need to be accurate to a certain day , it is recommended that you use the DATE type because his storage space requires only 3 bytes, which is more than TIMESTAMP Still less. It is not recommended to store the value of a Unix timestamp through the INT type class because it is too intuitive to cause unnecessary inconvenience to maintenance and does not bring any benefit.

Enum & SET: For the Status field, you can try to use an ENUM to store it, because you can greatly reduce the storage space, and even if you need to add a new type, as long as it is added at the end, modifying the structure does not require rebuilding the table data. What about storing pre-defined attribute data? You can try out the SET type, even if there are multiple properties, and you can also save a lot of storage space.

  LOB type : strongly against storing in the database LOB type of data, although the database provides such functionality, but this is not what he is good at, we should let the right tools to do what he is good at, in order to reach the extreme. Storing LOB data in a database it's like having a marketing professional who learned a little bit of Java in school a few years ago to write java Code.

2. character encoding

The character set directly determines the data in the the storage encoding in MySQL, because the same content using different character sets means that the amount of space occupied by a large difference, so by using the appropriate character set, can help us reduce the amount of data, thereby reducing the number of IO operations.

The pure Latin word rp means that there is no need to choose a character encoding other than latin1 because it saves a lot of storage space

If we can be sure that there is no need to store multiple languages, there is no need to use UTF8 or other UNICODE character types, which creates a lot of wasted storage space

MySQL data type can be accurate to the field, so when we need large database to hold multi-byte data, you can reduce the amount of data storage by using different data types for different fields in different tables, thus reducing IO Number of operations and increased cache hit ratio

3. proper splitting

There are times when we might want to have a complete object that corresponds to a database table, which is good for application development, but sometimes it can be a big problem in terms of performance.

when our table exists similar to the TEXT or a large type of VARCHAR , if we do not need this field for most of our visits to this table, we should not hesitate to split it into separate tables to reduce the storage space used by common data. One obvious benefit of this is that the number of data bars that can be stored in each block can be significantly increased, reducing the number of physical IO times and significantly increasing the cache hit rate in memory.

the above points are optimized to reduce the amount of storage space per record, so that more records can be stored in each database to reduce the number of IO operations, increasing the cache hit ratio. The following optimization recommendations may not be quite understood by many developers, as this is a typical inverse paradigm design and is also contrary to some of the above-mentioned optimization recommendations.

4. Moderate Redundancy

Why do we need redundancy? Does this increase the size of each piece of data, reducing the number of records that each block can hold?

Indeed, this will increase the size of each record and reduce the number of bars that can be stored in each record, but in some scenarios we still have to do this: they are frequently quoted and can only be Join 2 (or more) large tables to get the independent small segment. Such a scene because each join just to get the value of a small segment,join to the record is large, will cause a lot of unnecessary IO, It can be optimized by the way space is exchanged for time. However, redundancy requires that data consistency is not compromised, and that redundant fields are updated as well as updated

5. Use not NULL as much as possible

the NULL type is special andSQL is difficult to optimize. Although the MySQL null type differs from Oracle 's null , it goes into the index, but if it is a composite index, then this NULL types of fields can greatly affect the efficiency of the entire index. In addition,NULL processing in the index is also special and takes up additional storage space.

many people think Null will save some space, so try to get null to save IO , but most of the time it will backfire, although there may be some savings in space, It brings a lot of other optimization problems, not only not to save the io , but to increase the amount of SQL io . So try to ensure that the default value is not NULL, but also a good table structure design optimization habits.

The IO and database optimization problem of database

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.