Discussion on MySQL-based high-availability and Scalable Architecture

Source: Internet
Author: User

From: http://www.programmer.com.cn/3567/

With the soaring amount of information, information storage has become a crucial technology in this era. How to ensure that the data storage technology can adapt to the increasing speed of information and our high dependence on information has become a very important topic. This article describes how to build a distributed data layer by using open-source data storage software at the database architecture level. We hope to implement a low-cost high-availability and scalable data layer architecture.


 

Traditional database architecture

Looking at the traditional commercial database software, most of them focus on the centralized architecture, and there are few architectures with the distributed design concept. The biggest feature of these traditional database software is to centralize all the data in one database and rely on large and high-end devices to provide high-end capability and scalability.

In terms of scalability, the centralized database architecture mainly depends on the scalability of hosts and storage devices that store data. That is to say, it is difficult to achieve better horizontal scalability by relying on the hardware's own vertical scalability. The reliability is also based on hardware devices, mainly through the share storage method. Oracle RAC, a well-known representative of traditional commercial databases, is a typical centralized architecture that shares everything.

We can use figure 1 to briefly describe the typical architecture of traditional databases: the traditional architecture mostly shares storage devices on the host end through two hosts, usually, one of the hosts uses storage for management through database software. In this architecture, only one database on the host (except RAC) can provide services, and the other can only be used as hot standby redundancy, and database instances cannot be started to provide services. Therefore, the processing capability depends entirely on the maximum scalability of the host. It is difficult to increase the processing capability by increasing the number of hosts. After all, the scalability of a single host is limited. Even some manufacturers' mainframes have the same scalability limitations. In addition, the dependence of traditional architectures on high-end devices will undoubtedly lead to a significant increase in system costs, and may even lead to system hijacking by host and hardware vendors ", we have to continuously increase investment costs.

 

Figure 1 Typical architecture of traditional databases

 

Mysql-based scalability and high reliability

MySQL, as a leader in open-source databases, is far different from the traditional commercial database software in terms of both the design philosophy of the software and the architecture ideas commonly used by users. MySQL discards the traditional share-Everything idea and adopts the share-nothing idea. MySQL replication implementation mechanism and MySQL cluster architecture design all reflect this idea. This also gives MySQL a flexible architecture design philosophy in terms of scalability and high reliability, and allows our database to get rid of high-end devices, use a lot of cost-effective PC servers.


 

Scalability

In terms of scalability improvement, the most common feature is to use the replication function of MySQL to asynchronously copy data from the same MySQL to another or multiple MySQL hosts at the same time, and allows these MySQL hosts to provide external query services at the same time. The query processing capability is also increased for each added replication node. The processing capability of the new node is the added processing capability of the entire system. Because MySQL replication is mainly a logical method, the same cluster can have hardware from multiple vendors or use different OS, so it can be completely free from any software/hardware platform restrictions, get rid of the dependency on a single platform.

People may be dissatisfied with the functional features of MySQL replication, and then use third-party open-source software, it is even possible to achieve identical or even better replication results by parsing the replication software developed by the open-source communication protocol for real-time (or asynchronous) data replication. Traditional databases may also have a function to replicate data. However, compared with MySQL, because the database only relies on its own characteristics to replicate data, there are some shortcomings in architecture flexibility and controllability.

Finally, the idea of Data splitting (horizontal/vertical), which has to be mentioned in terms of scalability improvement, can also be flexibly used in the MySQL database, whether vertical splitting is performed in the form of a function module or horizontal splitting by hash segments of a specific key (field, or through the database partition function, you can achieve a good job in MySQL. Of course, no matter whether the data is scattered before splitting, or the data routing and merging after splitting are inseparable from the collaboration and cooperation at the application layer, unless implemented through MySQL cluster. For a scalable architecture, see Figure 2 for a more intuitive presentation.

 

Figure 2 architecture that can improve scalability

The figure uses "master" as the core and copies data to the corresponding MySQL cluster in three different ways to provide external services. In the actual architecture, the master node is a Data Writing point, and the replicated cluster can provide corresponding query requests. In common web application systems, query requests are much larger than write requests. Therefore, it is very suitable for MySQL databases to use similar architecture ideas to solve practical scalability problems.


 

High reliability

In terms of high reliability, we can also design a variety of flexible and highly reliable architectures based on MySQL replication with simple application layer architecture or some open-source third-party ha management software.

For data writing points, you can use the replication function of MySQL to set the two MySQL hosts to the status of Double A, and use the HA management software to check the status of MySQL, to determine the status of MySQL and provide a single IP address for external services, so as to ensure that one MySQL will be switched to another MySQL immediately after a crash. This automatic method makes it very convenient for our writing points to have high reliability. If our application can also automatically determine that it will be better to automatically switch to another point immediately after a point becomes invalid, it can basically achieve completely transparent switching to the outside, there is little impact on availability, which is much better than professional ha management by some well-known vendors.

For data query points, it is easier to implement high reliability, you can build multiple MySQL nodes with identical data copies on any of the two master nodes of Dual-A through data replication, to ensure that multiple MySQL Databases can provide external data query services at any time. When a MySQL instance crashes, the system immediately removes the node from the node that can be served. With the help of application architecture, this is very easy to do.

Figure 3 shows an instance with high reliability. 1 depicts the basic architecture, 2, 3, and 4 respectively describe the high reliability implementation when the read node fails and any of the write nodes fails. In terms of high reliability, MySQL also provides a more advanced solution-mysql cluster, a fully distributed database cluster, in addition to manually copying data to multiple MySQL hosts, it is also a typical share-nothing distributed database architecture. The data layer is separated from the SQL layer. Each data copy is stored in two or more copies on different data nodes, all nodes on the entire data layer process the data of the entire database together with the idea of distributed computing, ensuring high concurrency processing capability while ensuring high reliability through data redundancy.

 

Figure 3 High-reliability instance Diagram


 

Build a data layer based on MySQL, cache, and search

As the load of Web application systems grows faster and faster, the system is often overwhelmed, especially the database system. With the increasing attention to user experience, the response speed and ease of use have become an inevitable topic. No matter how it is optimized, it is impossible to avoid the mismatch between the disk's physical I/O response speed and the I/O speed in the memory, therefore, we naturally think of improving the response speed through memory cache.

Data search is the most typical feature in terms of ease of use. The features of relational databases make it difficult to provide a full-text retrieval system similar to Google. Once again, we use the "External Force" method to improve the service, and use similar search engine systems such as Lucene or egothor, or software such as sphwheel to integrate with the database, adds the full-text retrieval capability to the database.

You can design a database (my-SQL) for persistence and regular data access, and use a distributed memory cache system to provide access to data with extremely high response speed and concurrency capabilities, A third-party or self-developed distributed full-text search system is used to provide a comprehensive distributed data service layer for full-text search. In the middle, a unified data access layer is implemented through the help of the application architecture, to control Data Reading, writing, retrieval, and other operations, as shown in Figure 4: There are many implementation methods for synchronizing or asynchronously writing data from MySQL to the cache and search systems. For example, the real-time requirement is not high, you do not want the data proxy layer to have too many control logics. You can use the queue method to implement them asynchronously. For example, you have high requirements on real-time performance, you can also use the data proxy layer application to control data updates in cache and search. You can also use the User-Defined Functions of MySQL, update data to the cache cluster in real time in trigger mode. You can come up with many similar methods. The key is to select the most appropriate and simple method to implement it based on the Application scenario.

 

Figure 4 Implementation of Data Writing to cache and search systems

The architecture does not matter the best, only the most suitable. Any architecture has applicable scenarios and corresponding lifecycles. Changes in application scenarios or business volume may result in insufficient architecture. Quote a line in a movie: "Come out and get it together, sooner or later !"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.