This blog post aims to read and understand your own books and use this picture to explain. If there are any mistakes, I hope you can correct them and share them with us for the purpose of mutual discussion. First, let's introduce the source of the Architecture diagram: I recently read some books about mysql.
This blog post aims to read and understand your own books and use this picture to explain. If there are any mistakes, I hope you can correct them and share them with us for the purpose of mutual discussion. First, let's introduce the source of the Architecture diagram: I recently read some books about mysql.
This blog post aims to read and understand your own books and use this picture to explain. If there are any mistakes, I hope you can correct them and share them with us for the purpose of mutual discussion.
First, let's introduce the source of the Architecture diagram: I recently read some of my experiences on mysql and converted the text into images for ease of understanding.
I will mainly elaborate on three aspects: read, write, and underlying Disk:
1. Read operations:
We know that when reading data, we need to read the data from the disk to the memory and then perform corresponding operations. when optimizing the read operation, we mainly want to optimize the buffer and cache:
The above are some introduction to read operations, followed by write operations.
2. write operation:
Write operations are divided into hot data and common data. In short, they are divided by frequency. However, frequently modified data can be separated from non-frequently modified data.
For example:
For example, my website has a PV of 1000 million every day, and in the PV statistics table, a data entry is inserted every time I access the table, 1000 million data entries per day. Of course, this cannot be shared within 24 hours, for the peak hours of 10 hours, each hour is also million pieces of data. If the other tables on my website contain 2 million pieces of new data every day, compared with million pieces of data, there are too few data items, but these two million data items have important data items. If it is the user registration or the order placed by the customer to purchase the product, it is more important than recording PV information, now the problem arises: What is my hot data? It is self-evident whether PV statistics are orders or registered users. Of course, one of them is still the key data, so in order not to record PV data to affect updates of more important data, we can separate them. If there is a master-slave synchronization later, the synchronization load will be much lower after the separation, so that we can only synchronize 2 W pieces of data, Hong Kong Space, instead of taking into account the million pieces of data, server space, and the load of the primary database will also decrease.
Now popular data is separated from common data, but for highly concurrent database servers, how to defend against concurrency becomes an important issue. Of course, high-end servers and clusters are used, nosql can also be used to solve this problem. If the queue mechanism can be used during design, it would be better! (You can see a syntax on the subway: orderly and smooth !)
Of course, you may have a better way to communicate with Hong Kong virtual hosts.
3. Underlying disk planning:
RAID6:
The RADI6 technology is based on RAID5 and is designed to further strengthen data protection. It is actually a RAID method that extends the RAID5 level. Different from RAID5, in addition to each hard disk, there is an XOR verification area for each data block. Of course, the verification data of the current disk data block cannot exist in the current disk but is stored in a staggered manner. The specific form is shown in the figure. In this way, each data block has two verification protection barriers (one hierarchical verification and the other is the overall verification). Therefore, the data redundancy performance of RAID 6 is quite good. However, because of the addition of a validation, the write efficiency is worse than RAID5, and the design of the control system is more complex. The second verification area also reduces the effective storage space.
RAID10 can be used for popular data, which improves the performance and security. RAID5 can be used for common data, which mainly provides security. RAID0 can be used for temporary tables, play a huge advantage in performance.
All of the above personal opinions can be shared and learned if you have any questions!
This article is from the "Ro NLP blog". Please keep this source