Introduction to MySQL learning and thinking

Source: Internet
Author: User

This blog post aims to read and understand your own books and use this picture to explain. If there are any mistakes, I hope you can correct them and share them with us for the purpose of mutual discussion.

First, let's introduce the source of the Architecture diagram: I recently read some of my experiences on mysql and converted the text into images for ease of understanding.

650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131229/19424A224-0.png "border =" 0 "alt =" "/>

I will mainly elaborate on three aspects: read, write, and underlying Disk:

1. Read operations:

We know that when reading data, we need to read the data from the disk to the memory and then perform corresponding operations. when optimizing the read operation, we mainly want to optimize the buffer and cache:

 
 
  1. Key_buffer_size
  2. This is a very important parameter for the MyIsam table. Generally, you can set it to 30%-40% of the memory. Of course, this depends on the actual situation, myISAM tables use the operating system cache to cache data. Therefore, you need to leave some memory for them. In many cases, the data is much larger than the index.
  3.  
  4. Innodb_buffer_pool_size
  5. This is an important parameter for InnoDB, while InnoDB is more sensitive to buffering. MyISAM can run under the default key_buffer_size setting, however, the default innodb_buffer_pool_size setting of Innodb is similar to snail bait. Because Innodb caches data and indexes, there is no need to leave too much memory for the operating system. Therefore, if you only need Innodb, you can set it to up to 70-80% of available memory. Some rules apply to key_buffer: If your data volume is small and does not increase rapidly, you do not need to set innodb_buffer_pool_size too large.
  6.  
  7. Table_cache
  8. Table cache, which occupies system resources and memory. Because each temporary table needs to be opened in Chengdu, this value can be increased when the number of connections is large.
  9.  
  10. Thread-cache
  11. The Thread cache, Thread creation and destruction overhead may be very large, so the connection and disconnection needs for each Thread. If the number of active concurrent connections in the program and the value of Thread-Created is large, you can set the value to be larger.
  12.  
  13. Query-cache
  14. If the application has a large number of reads, you can set a large value, but not too large, because maintaining it also requires a lot of overhead. Generally, you can set 32M-512M.
  15.  
  16. Sort_buffer_size
  17. This is a connection-level parameter. When each connection needs to use this buffer for the first time, the set memory is allocated at one time. The larger the value is, the better. If the value is too large, in the case of high concurrency, the performance will be reduced. When sort_buffer_size exceeds 2 kb, mmap () instead of malloc () will be used for memory allocation, resulting in lower efficiency.
  18.  
  19. When a mysql temporary table is working on a very large table, you may occasionally need to run many queries to obtain a small subset of a large amount of data, instead of running these queries on the entire table, instead, MySQL can find a few required records each time, and it may be faster to select a temporary table, and then run the query on these tables. The mysql server will automatically create an internal temporary table: This temporary table can be a temporary memory table that only exists in the memory, or a temporary myisam table that is stored in the hard disk; in addition, the initial memory temporary table may be changed to a myisam temporary table because of the increase of the table. The conversion critical point is determined by the small values of the max_heap_table_size and tmp_table_size system variables! Note: max_heap_table_size system variables are applied to all memory engine tables, whether they are temporary tables, normal tables, or internal temporary tables. Of course, the program can also create a temporary table: create temporary table XX; of course, this is the program control. After the creation and use are completed, the deletion will be controlled by the program.

The above are some introduction to read operations, followed by write operations.

2. write operation:

Write operations are divided into hot data and common data. In short, they are divided by frequency. However, frequently modified data can be separated from non-frequently modified data.

For example:

 
 
  1. For example, my website has a PV of 1000 million every day, and in the PV statistics table, a data entry is inserted every time I access the table, 1000 million data entries per day. Of course, this cannot be shared within 24 hours, for the peak hours of 10 hours, each hour is also million pieces of data. If the other tables on my website contain 2 million pieces of new data every day, compared with million pieces of data, there are too few data items, but these two million data items have important data items. If it is the user registration or the order placed by the customer to purchase the product, it is more important than recording PV information, now the problem arises: What is my hot data? It is self-evident whether PV statistics are orders or registered users. Of course, one of them is still the key data, so in order not to record PV data to affect updates of more important data, we can separate them. If there is a master-slave synchronization later, the synchronization load will be much lower after the separation, so that we can only synchronize 2 W pieces of data, instead of having to consider the million pieces of data, the load on the primary database will also decrease.

Now popular data is separated from common data, but for highly concurrent database servers, how to defend against concurrency becomes an important issue. Of course, high-end servers and clusters are used, nosql can also be used to solve this problem. If the queue mechanism can be used during design, it would be better! On the subway, you can see an advertisement saying "order is smooth !)

Of course, you may have a better way to communicate with each other.

3. Underlying disk planning:

 
 
  1. RAID0:
  2. Data is continuously divided by bit or byte and read/write in parallel on multiple disks. Therefore, the data transmission rate is very high, but there is no data redundancy, therefore, it cannot be regarded as a real RAID structure. RAID0 only improves performance and does not guarantee data reliability. In addition, failure of a disk affects all data. Therefore, RAID 0 cannot be used in scenarios with high data security requirements.

 

650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131229/1942464341-1.png "border =" 0 "alt =" "/>

 

  1. RAID1:
  2. It achieves data redundancy through disk data mirroring, and generates mutually backed up data on pairs of Independent Disks. When the raw data is busy, you can directly read the data from the image copy, so RAID1:
  3. Read performance can be improved. RAID1 is the most costly disk array, but provides high data security and availability. When a disk fails, the system can automatically switch to the image disk to read and write data without restructuring the invalid data.

 

  1. RAID0 + 1:
  2. Also known as the RAID10 standard, it is actually the product of combining RAID0 and RAID1 standards. While continuously splitting data by bit or byte and concurrently reading/writing multiple disks, make Disk Images redundant for each disk. Its advantage is that it has both the extraordinary speed of RAID 0 and the high data reliability of RAID 1, but the CPU usage is also higher, and the disk utilization is relatively low.

 

650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131229/19424C1Q-2.png "border =" 0 "alt =" "/>

  1. RAID5:
  2. Instead of separately specified parity disks, data and parity information are accessed across all disks. On RAID5, read/write pointers can be performed on the array devices at the same time, providing higher data traffic. RAID5 is more suitable for small data blocks and random read/write data. The main difference between RAID3 and RAID5 is that RAID3 requires all array disks for each data transmission. For RAID5, most data transmission only applies to one disk, and can perform parallel operations. There is a "Write loss" in RAID5, that is, each write operation will generate four actual read/write operations, two of which read the old data and parity information, write new data and parity information twice.

 

650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131229/1942463636-3.png "border =" 0 "alt =" "/>

 

  1. RAID6:
  2. The RADI6 technology is based on RAID5 and is designed to further strengthen data protection. It is actually a RAID method that extends the RAID5 level. Different from RAID5, in addition to each hard disk, there is an XOR verification area for each data block. Of course, the verification data of the current disk data block cannot exist in the current disk but is stored in a staggered manner. The specific form is shown in the figure. In this way, each data block has two verification protection barriers, one is hierarchical verification, and the other is the overall verification), so RAID6's data redundancy performance is quite good. However, because of the addition of a validation, the write efficiency is worse than RAID5, and the design of the control system is more complex. The second verification area also reduces the effective storage space.

 

650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131229/1942463439-4.png "border =" 0 "alt =" "/>

RAID10 can be used for popular data, which improves the performance and security. RAID5 can be used for common data, which mainly provides security. RAID0 can be used for temporary tables, play a huge advantage in performance.

All of the above personal opinions can be shared and learned if you have any questions!

 

This article is from the "Ro blog" blog, please be sure to keep this source http://luoweiro.blog.51cto.com/2186161/949451

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.