MySQL performance tuning and Architecture Design-Chapter 1 Basic Principles of scalable design, mysql Tuning
Chapter 1 Basic Principles of scalable design
Preface:
With the rapid increase of information, the development of hardware equipment has gradually become unable to keep up with the application system's requirements for processing capabilities. How can we solve the performance requirements of the system? There is only one way, that is, by transforming the system architecture system, improving the system's scalability, and combining multiple low-processing hardware devices to achieve a high-processing system. That is to say, we must implement a scalable design. Scalable design is a very complex system engineering that involves a wide range of aspects and complicated technologies. It may bring about many other problems. But no matter how we design it, no matter what problems we encounter, there are some principles that we must ensure. This chapter gives a brief introduction to the principles that need to be ensured during the scalable design process.
12.1 what is scalability
Before discussing the scalability, many people may ask: I often hear about how the system of a certain website is designed for scalability, how the architecture is good, and what is scalability? How is it scalable? What is scalability? In fact, this is what we often hear about Scale, Scalable, and Scalability.
From the database perspective, Scale is to enable our database to provide stronger service capabilities and processing capabilities. Scalable indicates that the database system can provide better processing capability after being upgraded (including increasing the processing capability of a single machine or increasing the number of servers. Theoretically speaking, any database system is Scalable, but it only requires different implementation methods. Scalability (Scalability) is the difficulty of Improving the processing capability of a database system after the corresponding upgrade. In theory, any system can be upgraded to improve the processing capability. However, the upgrade costs (capital and manpower) required for different systems to improve the same processing capability are different, this is what we call the Scalability of each database application system.
Here, the different database application systems I mentioned do not refer to the differences in the database software itself (although different database software may also have Scalability differences ), it refers to the design of different application architectures for the same database software. This is exactly what this chapter and several other articles will focus on.
First, we need to know that the scalability of a database data system is mainly reflected in two aspects: horizontal scaling and vertical scaling, this is what we often call Scale Out and Scale Up.
Scale Out refers to horizontal scaling and outward scaling, that is, improving the overall processing capability by adding processing nodes, the more practical aspect is to increase the overall processing capability by adding machines.
Scale Up refers to vertical scaling and upward scaling, that is, improving the overall processing capability by adding the processing capability of the current processing node. To put it bluntly, it is by upgrading the configuration of the existing server, for example, increase the memory, CPU, hardware configuration of the storage system, or directly change to a server with higher processing capability and a higher-end storage system.
By comparing the two Scale modes, we can easily see their respective advantages and disadvantages.
◆ Advantages of Scale Out:
1. low costs, it is easy to build a computing cluster with very powerful processing capabilities through low-cost PC servers;
2. It is not easy to encounter bottlenecks, because it is easy to increase the processing capability by adding hosts;
3. A single node failure has little impact on the overall system. There are also disadvantages. More computing nodes are mostly server hosts, which naturally increases the maintenance complexity of the entire system, in some aspects, the maintenance cost will certainly be increased, and the architecture requirements for the application system will be higher than the Scale Up, the need for cluster management software.
◆ Disadvantages of Scale Out:
1. There are many processing nodes, increasing the overall complexity of the system architecture and the complexity of applications;
2. Cluster maintenance is difficult to implement and the maintenance cost is higher;
◆ Advantages of Scale Up:
1. There are few processing nodes and maintenance is relatively simple;
2. All data is centralized, and the application system architecture is simple and development is relatively easy;
◆ Disadvantages of Scale Up
1. High costs of high-end equipment, with less competition, are easily limited by manufacturers;
2. Limited by the speed of development of hardware devices, the processing capacity of a single host is always limited, and it is easy to encounter performance bottlenecks that cannot be solved in the end;
3. The equipment and data are centralized, which has a greater impact after the fault occurs;
In the short term, Scale Up will have a greater advantage, because it can simplify O & M costs, simplify system architecture and application system development, and make technical requirements easier.
However, in the long term, Scale-Out has a greater advantage and is also an inevitable trend after the system reaches a certain Scale. In any case, the processing capability of a single machine is always limited by hardware technology, and the development speed of hardware technology is always limited. In many cases, it is difficult to keep up with the speed of business development. Moreover, the higher the processing capability of high-end devices, the worse the cost-effectiveness. Therefore, building a high-processing distributed cluster with multiple low-cost PC servers will always become a goal for companies to save costs and improve their overall processing capabilities. Although you may encounter various technical problems when achieving this goal, it is always worth studying and practicing.
Later, we will focus on the Scale-Out analysis and design. To be able to Scale Out well, distributed system design is required. For databases, we only have two ways to Scale Out. One is to achieve expansion of many identical data sources through continuous data replication, the other is to split a centralized data source into multiple data sources for expansion.
Next, let's look at what principles should be followed to design a database application system with good Scalability.
12.2 principle of minimizing transaction relevance
When building a distributed database cluster, many people are concerned about transactions. After all, transactions are a core function in the database.
In the traditional centralized database architecture, the transaction problem is very well solved and can be completely guaranteed by the mature transaction mechanism of the database itself. However, once our database is deployed as a distributed architecture, many transactions originally completed in a single database may need to be performed across multiple database hosts, in this way, the concept of distributed transactions may need to be introduced for standalone transactions.
However, you must have some knowledge that distributed transactions are a very complex mechanism, whether it is a large commercial database system or open source database system, although most database manufacturers basically implement this function, there are more or less various restrictions. There are also some bugs that may cause some transactions not to be well guaranteed or cannot be completed smoothly.
At this time, we may need to seek other alternative solutions to solve this problem. After all, transactions cannot be ignored, but we always need to implement them instead of how to implement them.
Currently, there are three main solutions:
1. Design splitting rules rationally during Scale Out design to ensure that the transaction data must be on the same MySQL Server as much as possible to avoid distributed transactions.
If all transactions can be completed on a single MySQL Server when designing data splitting rules, our business needs can be easily implemented, the application can achieve the minimum adjustment to meet the architecture changes, greatly reducing the overall cost. After all, the database architecture transformation is not just about DBA, but also requires a lot of peripheral cooperation and support. Even when designing a brand new system, we also need to consider the overall investment of each environment, both the cost investment of the database and the corresponding development cost. If there is an "interest" conflict between various links, we must make a balance based on the subsequent expansion and the overall cost to find a balance point that best suits the current stage.
However, even if our sharding rules are well designed, it is difficult to make all the data required by the firm on the same MySQL Server. Therefore, although this solution requires the least cost, most of the time it can only take into account most of the core transactions, it is not a perfect solution.
Second, large transactions are divided into multiple small transactions. The database ensures the integrity of each small transaction, and the application controls the overall transaction integrity between each small transaction.
Compared with the previous scheme, this scheme will bring about more application transformation and more demanding application requirements.
The application not only needs to split many of the original large transactions, but also needs to ensure the integrity between each small transaction. That is to say, the application itself needs to have certain transaction capabilities, which will undoubtedly increase the technical difficulty of the application.
However, this solution also has many advantages. First, our data splitting rules will be simpler and difficult to limit. And simpler, it means lower maintenance costs. Second, there are no too many restrictions on data splitting rules, and the database scalability will be higher, so there will be no too many constraints. When there is a performance bottleneck, You can further split the existing database quickly. Finally, the database is far from the actual business logic, which will be more advantageous for subsequent architecture expansion.
Third, combine the two solutions to integrate their respective advantages to avoid their respective drawbacks.
Both of the preceding solutions have their own advantages and disadvantages, and are basically opposite to each other. We can use their respective advantages to adjust the design principles of the two solutions, balance the entire architecture design. For example, we canTo ensure that some core firms need data on the same MySQL Server, while other transactions are not particularly important, they are ensured by splitting them into small transactions and combining them with the application system.. In addition, for some transactions that are not particularly important, we can also analyze them in depth to see if transactions are inevitable and must be used.
With this balanced design principle, we can avoid applications that need to process too many small transactions to ensure their overall integrity, at the same time, it can avoid the complexity of splitting Rules, resulting in increased maintenance difficulty and scalability.
Of course, not all application scenarios must be solved by combining the above two solutions. For example, for those applications that do not have very strict transaction requirements, or the transaction itself is very simple, the sharding rules can be slightly designed to meet the relevant requirements, we can simply use the first scheme to avoid the need for applications to maintain the overall integrity of some small transactions. This can greatly reduce the complexity of the application.
For applications with complex transaction relationships and high correlation between data, we do not have to design hard to keep transaction data centralized, because no matter how hard we work, all of them are difficult to meet the requirements, most of which are situations where you lose your mind. In this case, we should try to keep the database as concise as possible, and make some sacrifices for the application.
In many large Internet applications, no matter which of the above solutions is used, such as Ebay, to a large extent, this is the third combination of solutions. In the process of integration, the second solution is the primary solution, and the first solution is supplemented. In addition to the needs of their application scenarios, this architecture provides a strong technical strength to ensure the development of powerful application systems. Another example is that a large domestic BBS application system (which cannot disclose its real name) does not have a very complex transaction relevance, and data associations between functional modules are not particularly high, the first solution is completely adopted. By rationally designing data splitting rules, the transaction data source should be avoided across multiple MySQL servers.
Finally, we need to understand that the more transactions, the better, the less the transactions, the better.
No matter what solution we use, when designing an application, we need to minimize the transaction relevance of data, or even eliminate the need for transaction relevance. Of course, this is only relative, and only part of the data can be done. However, after a certain part of the data is non-transactional, the overall complexity of the system will be greatly reduced. Both applications and database systems can pay a lot less.
12.3 Data Consistency Principles
Whether it is Scale Up or Scale Out, no matter how we design our own architecture, ensuring the final consistency of data is absolutely not against the principle, to ensure the importance of this principle, I think you will understand it very well.
In addition, the guarantee of data consistency is the same as the transaction integrity. Some problems may occur when we design a Scale Out system. Of course, if it is Scale Up, there may be very few such troubles.
Of course, in the eyes of many people, data consistency is also a category of transaction integrity to some extent. However, in order to highlight its importance and related features, I will analyze it separately.
So how can we ensure data consistency while scaling Out? In many cases, this problem is the same as ensuring the integrity of the transaction, and it has also attracted the attention of many architects. After many practices, we finally summarized the BASE model. That is, the basic availability and flexibility are basically the same as the final consistency. These words seem complicated and profound. In fact, you can simply understand the non-real-time consistency principle.
That is to say, the application system is implemented through related technologies to allow the entire system to be in a non-real-time state for a short period of time on the basis of satisfying users' needs, the subsequent technologies are used to ensure data consistency. This theoretical model is simple to say, but we will encounter many problems in the actual implementation process.
First, the first question is, do we need to make all the data non-real-time consistency? I think most readers and friends will definitely vote against it. If not all data is non-real-time consistency, how can we determine which data requires real-time consistency and what data requires only non-real-time eventual consistency? In fact, this is basically a division of the business priorities of each module. For applications with a higher priority, it is a rule that ensures real-time data consistency, you can consider dividing it into camps that allow eventual consistency in a short period of time. This is a very tricky issue. We can't just shoot our heads, but make a decision only through detailed analysis and careful evaluation. Because not all data can appear in an inconsistent state within a short period of time, and not all data can be eventually consistent through post-processing, therefore, the two types of data must be consistent in real time.
However, to distinguish these two types of data, we must make a full assessment after analyzing the business needs of the business scenario in detail to draw a conclusion.
Second, how can we achieve eventual consistency of inconsistent data in the system? In general, we must clearly divide the business modules designed for such data into business modules that require real-time consistent data. Then, we use the relevant asynchronous mechanism technology and background processes to further process inconsistent data through system data, logs, and other information, so that the final data is completely consistent. Different background processes can be used for different modules to avoid data disorder or concurrent execution to improve processing efficiency. For example, there is no need to implement strict real-time consistency for user messages and notifications. You only need to record the messages to be processed, and then let the background processing process them in sequence, avoid service congestion at the front end.
Finally, avoid online interaction between the front-end and the final consistency of the two types of data in real time. Because of the inconsistency between the two types of data, the two types of data may be disordered during the interaction process, try to isolate all non-real-time and real-time consistent data in applications. Even in some special scenarios, it is necessary to perform physical isolation in different MySQL servers.
12.4 high availability and data security principles
In addition to the above two principles, I also want to mention the high availability of the system and data security. After our Scale-Out design, the overall scalability of the system will indeed be greatly improved, and the overall performance will naturally be greatly improved. However, the overall availability maintenance of the system has become more difficult than before. Because the overall architecture of the system is complex, both the application and the database environment will be larger and more complex than before. The most direct impact of this is that the maintenance is more difficult and the system monitoring is more difficult.
If the result of such a design transformation is the frequent Crash of our system and frequent Down machine accidents, I think everyone will be unacceptable, therefore, we must use various technical means to ensure that the availability of the system will not be reduced, or even improve overall.
Therefore, this naturally leads to another principle in the Scale Out design process, that is, the high availability principle. No matter how you adjust the design system architecture, the overall availability of the system cannot be reduced.
In fact, while discussing system availability, it will naturally lead to another closely related principle, that is, the data security principle. To achieve high availability, the data in the database must be safe enough. The security mentioned here is not for malicious attacks or theft, but for abnormal loss. In other words, we must ensure that our data will not be lost in the case of software/hardware faults. Once data is lost, there is no availability at all. In addition, data itself is the core resource of the database application system. It is beyond doubt that the data cannot be lost.
To ensure high availability and data security, the best way is to ensure it through redundancy mechanisms. All hardware and software devices remove the single point of failure (spof), and multiple copies of all data exist. In this way, we can better ensure this principle. In terms of technology, we can use MySQL Replication, MySQL Cluster, and other technologies.
Conclusion 12.5
No matter how we design the architecture or how our scalability changes, the principles mentioned in this chapter are very important. Whether it is the principle of solving some problems or the principle of guarantee, whether it is the principle of guarantee of availability, or the principle of ensuring data security, we should always pay attention to it during the design. The reason why MySQL databases are so popular in the Internet industry is that apart from its open-source features and ease of use, another very important factor is its advantage in scalability. The features of different storage engines can be used in different application scenarios. Its Replication and Cluster features are more effective means to improve scalability.
From: MySQL performance tuning and architecture design, Jian Chaoyang
Reprinted please indicate the source:
Author: JesseLZJ
Source: http://jesselzj.cnblogs.com