Scalability of the service

Source: Internet
Author: User
Tags app service

When writing an application, we often consider how the application should implement specific business logic. However, with the gradual development of more and more users, these applications often expose a series of problems, such as not easily increase capacity, fault tolerance and so on. This often leads to the inability of these applications to respond quickly to the needs of the user in the course of market expansion and ultimately to lose business opportunities.

Typically, we will apply the features that are used to avoid this series of problems called non-functional requirements. I believe you have been able to understand the term literally: functional requirements are used to provide support for business logic, while non-functional requirements are a series of requirements unrelated to business logic that may affect the subsequent development of a product. These requirements often include: high Availability (avalibility), Extensibility (Scalability), maintainability (maintainability), testability (testability), and so on.

In these non-functional requirements, extensibility is probably the most interesting one. So in this article, we'll explain how to write a highly scalable application.

What is extensibility

Suppose we write a Web application and place it on a shared cloud to serve the user. The app's creativity is novel and attracts a large number of users in a short period of time. However, since we did not expect it to handle so many users ' requests when we wrote the application, it was running slower and more slowly, and there might even be a situation where the service was unresponsive. The result of this frequent occurrence is that users will not be able to tolerate the frequent downtime of the app and will look for other similar applications to get similar services.

The lack of the application's ability to properly scale the processing capacity based on the load is the extensibility of the application, and the standard it measures is the simplicity of the capacity expansion. If your app can run better after adding more memory, or by adding an additional service instance to solve the problem of overloading the service instance, then we can say that the application is very extensible. If you have to rewrite the entire application to handle more load, the application developer needs to pay more attention to the extensibility of the application.

Better extensibility not only saves you the hassle of rewriting your application, but more importantly, it helps you get ahead of the competition in the market. Imagine that if your application already has a lack of processing power and there is no appropriate solution to improve the overall system's processing power, then all you can do is rewrite an application with the same functionality with a higher level of processing power. Your application's processing power is becoming more and more stretched over that time period. At the customer level, the responsiveness of your application is getting slower and worse, and sometimes it doesn't work. Your app will gradually churn out customers before new apps go live. These lost customers are likely to become loyal software-like customers, so that your product lost the opportunity to compete in the market. Conversely, if your application has very good extensibility, and your competitors are not keeping up with the growth of the user, then the application has the potential to completely surpass or even suppress competitors.

Of course, a successful application should not only have a high level of scalability, but should be done well on a range of non-functional requirements. For example, your app should not have too many bugs, and there should be no particularly serious bugs to avoid because these bugs cause your users to not use the app properly. At the same time your app needs to have a better user experience to make it easy for these users to become familiar with your app and create user stickiness.

Of course, these non-functional requirements are not confined to the user's perspective. For example, from the perspective of the development team, the testability of a software often determines the productivity of the test group. If an application needs to install a deployment on dozens of machines at a time, it will take several hours or even days for each tester to prepare for the new version. The test group has naturally become the most inefficient part of the software Development Group. For this we need to recruit a large number of testers, greatly increasing the overall cost of the application.

In general, an application has a very large number of non-functional requirements, such as completeness (completeness), correctness (correctness), availability (availability), reliability (reliability), security, Extensibility (Scalability), Performance (performance), and more. And these requirements will be on how to analyze, design and coding to make certain requirements. The requirements of different non-functional requirements often conflict. Which non-functional requirements are more important depends on the type of application you write. For example, when writing a large-scale Web application, scalability, security, and usability are important, and for a real-time application, performance and reliability prevail. In this article, our discussion will focus primarily on extensibility. Therefore, the proposed series of recommendations may have a greater impact on other non-functional requirements. However, the choice of how to choose will require the reader to decide according to the actual application.

Extension Methods for Applications

OK, let's go back to the topic of extensibility. The most fundamental cause of the need for a software extension is actually the throughput it needs to face. When a user's request arrives, the service instance needs to process it and translate it into operations on the data. In this process, both the service instance and the database need to consume a certain amount of resources. If a user's request is too high to make a component in the application impossible, then we need to find ways to improve the data processing capability of that composition.

The methods to improve the data processing ability are mainly divided into two categories, that is, vertical expansion and horizontal expansion. The operation of these two methods is the scale up and scale out.

Vertical scaling represents the solution to the problem by increasing the processing power of a single system when more load needs to be handled. The simplest case is to provide more powerful hardware for the system. For example, if the server instance where the database is located has only 2G of memory, which in turn causes the database to not run efficiently, then we can solve this problem by expanding the server's memory to 8G:

What is shown is to scale up by adding memory to address the situation where the database service instance IO is too high: when the server running the database service contains memory that cannot load the most common data stored in the database, it continuously reads the persisted to disk memory page from the hard disk. This results in a significant decrease in database performance. In the case where the server's memory is extended to 8G, those common data can reside in memory for a long time, allowing the disk IO of the service instance where the database resides to quickly return to normal.

In addition to improving the performance of a single service instance through a hardware approach, we can achieve vertical scaling of the application by optimizing the execution efficiency of the software. The simplest example is that if the original service implementation can only use a single thread to process the data, instead of taking advantage of the multiple CPU cores contained in the server instance, we can multiply the efficiency of the service execution by changing the algorithm to multiple threads to take full advantage of the CPU's multicore computing power.

But vertical scaling is not always the right choice. The most common factor that affects our choices is the cost of the hardware. We know that the price of hardware is usually related to the location of the hardware. If a piece of hardware is a mainstream configuration on the current market, the split-cost of research and development has become very small in every hardware because it has been shipped in large quantities. Conversely, if a piece of hardware is a high-end product that has just been put into the market, there will be a lot of research and development costs involved in each hardware. So the scale-up of the input performance curve is often as follows:

In other words, after a single instance has been optimized to a certain extent, it is not significant to spend a lot of time and money to improve the performance of a single instance. At this point, we need to consider scaling out, which is to use multiple service instances to provide services together.

Take an example of an online image processing service. Because image processing is a very resource-intensive computing process, a single server often fails to meet the requests sent by a large number of users:

As shown in, although our servers already have 4 CPUs installed, CPU usage is always on the alert in the case of a single server instance serving. If we add an identical server to the application to work together on the user's request, the load on each server will be reduced to about half of the original load, thus keeping the CPU usage below the perimeter.

In this case, the range of other functions provided by the service has been expanded. For example, the performance of saving the processing results will be twice times the original. Just because we do not need this expansion for the time being, the performance enhancements in this section are virtually useless and even a waste of service resources:

As you can see, before the scale-out, the orange-composed load has reached 90%, approaching the limit of a single service instance. To solve this problem, we then introduce a server instance to share the work. But this could lead to a lower utilization rate for several other components that are already low in resource utilization. The more correct way to expand is to only expand the orange composition:

As you can see from the above explanation, scale-out actually encompasses a number of ways. Accordingly, the Art of Scalability introduces a AKF extension model to be followed for a scale-out. According to the AKF extension model, scale-out actually contains three dimensions, while the scale-out solution is a combination of the work done on these three dimensions:

The most common representation of the AKF extension model is shown in. In this diagram, the Origin o represents an application instance and is not capable of performing any scale-out, but only through vertical scaling to improve its service capabilities. If your system is moving in the direction of an axis, it will have some degree of scale-out capability. Of course, these three axes are not mutually exclusive, so your app may have the ability to scale up to xyz three axes at the same time:

Now let's look at the meaning of each axis in the AKF extension model. The first thing to explain is the x-axis. In the AKF extension model, the X-axis represents an issue where an application can address extensibility by deploying more service instances. In this case, a large amount of load that would otherwise require a small number of service instance processing can be shared by the newly added service instances, thereby enlarging the system capacity and reducing the pressure on a single service instance.

As we have just mentioned, the extensibility of a service can be made up of multiple axes of extensibility at the same time, so in this service, the x-axis extensibility is not only at the service level, but also by the extensibility of sub-service and even service composition:

Please note the orange squares in the. In this service, the Orange block serves as a sub-service to provide specific functionality to the entire service. When scaling is needed, we can address the problem of excessive orange service load by adding a new orange sub-service instance. Thus, for the entire service, the scale-out capability of the x-axis is not accomplished by redeploying the entire service, but by expanding the independent sub-service.

I believe you will ask: since the expansion of service capacity can be achieved only by adding new services or sub-service instances, do we need two additional axial scale-out capabilities?

The answer is yes. First, the most realistic problem is the constraints of the service run scenario. For example, when scaling the service x-axis, we often need a load balancing service. As we have said in the article "Introduction to Enterprise-class load balancing," load-balancing servers often have some performance limitations. So horizontal expansion is not all-out. In addition, we have seen that scale-out is sometimes used on sub-services, while splitting a large service into multiple sub-services is itself a lateral extension along other axes.

The meaning of the y-axis scale-out is to divide all the work according to the type of data or business logic. As far as a Web service is concerned, the main task of the y-axis scale-out is to divide a monolith service into a series of sub-services, so that different sub-services can work independently and have the ability to scale independently. This can be done by sharing all requests processed by a service to a range of sub-service instances, allowing you to scale the x-axis of a sub-service that becomes a system bottleneck based on the actual operation of your application, avoiding the waste of resources due to the x-axis scale-out of the entire service:

This way of organizing each sub-service is called Microservice. Using Microservice to organize sub-services can also help you achieve a range of other non-functional requirements, such as high availability, testability, and more. The details are detailed in the article "Introduction to the Microservice Architecture model".

In contrast, it is more difficult to perform a y-axis extension than to perform an x-axis extension. But it often makes the other series of non-functional requirements have higher quality.

Horizontal scaling on the z-axis is probably the least familiar scenario. It indicates that the user's request needs to be divided according to certain characteristics of the user. For example, use DNS-based load balancing.

Of course, the extent to which your service needs to achieve the X, Y, Z axis expansion capability needs to be determined based on the actual service. If the final size of an application is not large, then only the x-axis expansion capability, or partial y-axis expansion capability can be. If an application grows very quickly and eventually evolves into an application with high throughput requirements, then we need to consider the application's expansion capability in the X, Y, Z axis from the outset.

Extension of the service

Well, with so much theoretical knowledge, I'm sure you can't wait to know how to make a good extension of an application. Well, let's start with the extensibility of the service instance.

As we have described earlier, there are two main ways to extend a service: scale-out and scale-out. For service instances, scale-out is simple: splitting the service into a multitude of sub-services and adding new service instances in the application with the help of technologies such as load balancing:

Shows how a service instance scales horizontally according to the AKF extension model. At the top level of the graph, we used DNS-based load balancing. Because DNS has the ability to determine the closest service to a user based on the location of the user, the IP that the user obtains in DNS lookups will point to the service closest to itself. For example, a user who is in the west of the United States may have of the IP when accessing Google. This feature is the z-axis in the AKF extension model: Users ' requests are divided according to certain characteristics of the user.

Next, the load-balancing server divides the user's requests according to the URL of the address the user is visiting. For example, when a user accesses a web search service, the service cluster needs to use the service instance in the dashed box on the left to serve the user. While accessing the image Search service, the service cluster needs to use the service instance in the dashed box on the right. This is the y-axis in the AKF extension model: Requests are divided according to the type of data or business logic.

Finally, because the most common service users use is web search, and the performance of a single service instance is limited, the service cluster often contains several service instances to provide the web search service. The load Balancing server distributes the user's requests based on the capabilities of each service instance and the state of the service instance. This is extended along the x-axis in the AKF extension model: The entire load is shared by deploying a service instance with the same functionality.

As you can see, with the help of a load balancing server, it is very simple to scale out an application instance. If you're interested in load balancing features, check out my other blog post, "Introduction to Enterprise-class load balancing."

The vertical scaling of services is an issue that is often overlooked by software developers compared to the scale-out of services. Scale-out is true to provide almost unlimited system capacity, but if the performance of a service instance itself is very low, this unlimited scale-out is often a waste of money:

As shown in, an app can of course provide services to users by deploying 4 servers with the same functionality. In this case, the cost of building the service is $50,000. However, due to the low quality of the application implementation itself, the resource utilization of these four servers is not high. If a brain-driven software developer is able to carefully analyze and correct the system bottlenecks in a service instance, the company will probably only need to buy one server, and the employee's personal ability and salary will be boosted and may receive an additional commendation. If the employee adds a high level of vertical scalability to the app, the app will be able to run well on servers with higher performance. In other words, the vertical extensibility of a single service instance can not only take full advantage of the performance that existing hardware can provide, to help reduce the cost of building the entire service, but also to be compatible with servers with stronger resources. This allows us to improve the entire service by simply adjusting the server settings, such as adding more memory, or using a higher-speed network.

Now let's look at how to improve the extensibility of a single service instance. In one application, the service instance is often at the core: it accepts the user's request and reads the data from the database as it processes the user request. Next, the service instance will combine the data obtained from these databases by computing and return it as a response to the user's request. During the entire process, the service instance may also get the results from the previous calculation process through the server-side cache:

In other words, a service instance is often run with data that is required to run by sending requests to other components. Because these requests are often a blocking call, the threads of the service instance are blocked, which in turn affects the efficiency of a single thread executing in the service:

As you can see, if we use blocking calls, the caller's thread will be blocked when another component is called to get the data. In this case, the entire execution process takes 3 copies of time to complete. And if we use non-blocking calls, then the caller can perform other tasks while waiting for the other component's response, allowing it to handle two tasks in 4 of the time, equivalent to a 50% increase in throughput.

So when writing a high-throughput service implementation, you first need to consider whether you should use the non-blocking IO functionality provided by Java. Typically, services that are organized by non-blocking IO are slower than those written by blocking Io, but their throughput at high loads is much higher than the services written by non-blocking IO. The best proof of this is Tomcat's support for non-blocking IO.

In earlier versions, Tomcat assigns a separate thread to the request when a request arrives, and the thread finishes processing the request. Once a blocking call occurs during the processing of the request, the thread hangs until the blocking call returns. After the request has been processed, the thread responsible for processing the request is sent back to the thread pool to wait for the next request to be processed. In this case, the maximum throughput that Tomcat can handle in parallel is actually related to the number of threads in its thread pool. Conversely, if the number of threads is set too large, then the operating system will be busy dealing with the thread management and switch a series of work, but reduce efficiency. In some newer versions, Tomcat allows users to use non-blocking IO. In this case, Tomcat will have a series of threads to receive the request. Once the request arrives, these threads receive the request and transfer the request to the worker thread that actually processed the request. As a result, only dozens of threads will be included in the new version of Tomcat, but can handle thousands of requests at the same time. Of course, since non-blocking IO is asynchronous, instead of performing subsequent processing as soon as the call returns, it takes a longer time to process a single request than to use blocking IO.

Therefore, when serving a small number of users, Tomcat using non-blocking IO is often more than twice times more likely to respond to a single request than Tomcat, but the throughput of Tomcat using nonblocking io is very stable when the number of users is thousands:

So if you want to improve the performance of your individual services, first you need to ensure that you are using nonblocking mode correctly in Web containers such as Tomcat:

<connector connectiontimeout= "20000" maxthreads= "$" port= "8080"

Protocol= "org.apache.coyote.http11.Http11NioProtocol" redirectport= "8443"/>

Of course, using non-blocking IO is done not just by configuring Tomcat. Imagine invoking another sub-service in one child service implementation: If the caller is blocked when the child service is invoked, then one of the callers ' threads is blocked there and cannot handle the other pending requests. Therefore, when you include a longer blocking call in your application, you need to consider the implementation of organizing the service in a non-blocking manner.

Before you organize your services in a non-blocking way, it's best to read Enterprise integration pattern in detail. Spring Integration Project Spring is an implementation of enterprise integration pattern in the spring system. Because it is a very big topic, I will simply introduce them in other blog post.

After increasing the number of concurrent connections by using non-blocking mode, we need to consider whether other hardware becomes a bottleneck for a single service instance. First, larger concurrency can result in larger memory consumption. So if you're developing an app that's more sensitive to memory size, the first thing you need to do is add memory to the system. And in the implementation of your memory-sensitive applications, memory management becomes a task that you need to consider. Although many languages, such as Java, have been addressed by providing a garbage collection mechanism to solve a series of problems such as wild pointers, memory leaks, but when these garbage collection mechanism starts, your service will be suspended temporarily. So in the process of service implementation, you need to consider some techniques to avoid memory recycling as much as possible.

Another hardware-related topic might be CPU. A server often contains multiple CPUs, which can contain multiple cores, so it is often possible to run more than 10 or even dozens of threads at a time on that service instance. However, when implementing services, we often ignore this information, which results in some services being executed in parallel by only a few threads. Typically, this is because the service accesses the same resource too much, such as excessive use of locks, synchronization blocks, or a number of reasons for insufficient database performance.

Another thing to consider is the separation of services. If an application needs to provide a range of static resources, then those common servlet containers may not be an optimal choice. Some lightweight web servers, such as Nginx, will be significantly more efficient at serving static resources than a series of dynamic content servers such as Apache.

Since this article is not intended to explain how to write a service with a higher performance, the various techniques described above to enhance the performance of individual services will no longer be explained in depth.

In addition to enhancing the vertical scalability of a service instance from the service itself, we also have an important weapon to improve the productivity of the service instance, which is server-side caching. These caches are recorded in the cache system by the results of previous calculations, so as to avoid the calculation of the result again. In this way, server-side caching can greatly reduce the pressure on the database:

What does it have to do with the extensibility of the service? The answer is that if the server-side cache can alleviate the load of each service in the system, it is actually equivalent to increasing the productivity of a single service instance, reducing the need for additional components to scale up, and increasing the extensibility of each related component in disguise.

There are two main types of server-side caches available today: caches running on service instances and within the same process as service instances, and caches running independently of service instances. The latter is now a more popular solution:

As you can see, because the in-process cache is bound to a specific application instance, each application instance will only have access to a specific cache. This binding, on the one hand, leads to a small amount of cache capacity that can be accessed by a single service instance, and may result in redundant data in different cache instances, reducing the overall efficiency of the caching system. By contrast, because isolated cache instances are run independently of individual application server instances, app service instances can access any cache instance. This solves both the problem that the service instance can use too little cache capacity and redundant data.

If you'd like to learn more about how to build a server-side cache, check out my other blog post, "memcached profile."

In addition to server-side caching, CDNs are a technology that prevents service overload. Of course, its primary function is to increase the speed at which users are accessing services far from the service. Typically, these CDN services are set up in different geographic regions based on a number of factors, such as request distribution and actual load. When providing a service, the CDN obtains the service's static data from the server and caches it within the CDN. When a user who is farther away from the service attempts to use the service, it will take these static resources from these CDNs to increase the speed at which these static data is loaded. This eliminates the need for servers to handle requests for static resources from all over the world, thereby reducing the load on the server.

Extensibility of the database

The extension of database is a more complicated topic than the service instance. We know that different services often vary greatly in how data is used. For example, different services often have very different read-write ratios, while others emphasize extensibility. Therefore, how to extend the database does not have a unified approach, but often depends on the application itself to the data requirements. So in this section, we'll take a bottom-up approach to explaining how to extend the database.

Often, a top-down explanation of a topic can often form a better knowledge system. When we use this method to explain the problem, we will first ask the question, and then take the problem as the center to explain the various sub-problems that make up the problem. We need to address these sub-problems individually and correlate and compare the solutions to these sub-problems. In this way, readers often have a clearer understanding of the advantages and disadvantages of each solution, and can then choose the solution based on the actual situation of the problem. This method is more suitable for simple and clear problems.

In cases where the problem is more complex and involves more cases, we need to split the problems into sub-problems and then analyze how the whole problem is solved through these sub-problem solutions after we have made clear each sub-problem.

So how to divide the extensibility of the database into sub-problems? The CAP theory is often used as a criterion when deciding what features a database should have. This theory points out that it is difficult to guarantee database consistency (consistency), availability (availability), and partition fault tolerance (Partition tolerance):

Therefore, a series of databases have chosen two of these features as the focus of their implementation. For example, the common relational database mainly guarantees the consistency of data and the availability of data, and does not emphasize partition fault tolerance, which is very important for extensibility. This is one reason why the scale-out of databases has become an industry problem.

Of course, if your application's requirements for consistency or availability are not so high, you can choose a database that focuses on partitioning fault tolerance. There are a number of these types of databases. For example, most popular NoSQL databases now use partition fault tolerance as a key point of implementation.

So in this section, we'll focus on a relational database. Because it is often more difficult to scale a relational database horizontally than vertically, we will first explain how to scale the relational database horizontally.

First, the most common and simplest way to scale up is to increase the performance of the service instance where the relational database resides. We know that the database will load the data it contains in memory at run time, and whether the most frequently accessed data exists in memory is the key to whether the database is running well. If the service instance where the database resides can provide enough memory based on the actual load to host all the data that is most commonly accessed, the performance of the database will be fully played. Therefore, the first step in performing a vertical extension is to check that the service instance in which your database resides has sufficient resources.

Of course, just starting with hardware is not enough. As already described in the previous chapters, vertical scaling requires two aspects: hardware enhancements, and software optimizations. As far as the database itself is concerned, its most important guarantee of operational performance is the index. In the contemporary database, indexes are mainly divided into clustered index and non-clustered index. These two indexes can speed up the lookup of data with specific characteristics:

Therefore, in the database optimization process, the index can be said to be the most important link. As you can see, if a lookup can be done by index rather than by looking up the records that are owned in the database, then the entire lookup needs to parse only a few nodes that make up the index, rather than traversing thousands of records owned by the database. This will greatly improve the performance of the database operation.

However, if the index does not exist in memory, then the database needs to read it into memory from the hard disk and then manipulate it. This is obviously a very slow operation. Therefore, in order for your index to work correctly, you must first ensure that the service instance where the database is running has enough memory.

In addition to ensuring that you have enough memory, we also need to ensure that the index of the database itself does not consume too much memory. One of the most common indexes where memory is wasted is index fragmentation. That is, after a series of additions, updates, and deletions, the data in the database becomes less regular in the physical structure of the storage. This is mainly divided into two kinds: Internal fragmentation, that is, there may be a large number of gaps in the physical structure, External fragmentation, that is, the data in the physical structure is not ordered in order. Internal fragmentation means an increase in the number of nodes included in the index. This leads to the need for more space to store the index, which can take up more memory and, on the other hand, increase the number of nodes that the data needs to traverse, leading to a decrease in system performance. External fragmentation, however, means that the hard drive is required to re-address the data when it is read from the disk sequence, and the performance of the system can be significantly reduced. Another question to consider about external fragmentation is whether our services use shared disks with other services. If so, the use of other services for disk can cause external fragmentation problems to be fundamentally resolved, and patrol operations will often occur.

Another common way to optimize an index is to include specific columns in a nonclustered index through an include clause to speed up the execution of some request statements. We know that the difference between clustered and non-clustered indexes is primarily in the presence of data. If we perform a lookup of the data from the clustered index, we can already get the data we need to find from that node after we find the corresponding node. And if our lookup is done in a nonclustered index, then we get the location where the target data resides. In order to find the real data, we also need to do an addressing operation. In the case where the INCLUDE clause contains the required data, we can avoid this addressing and improve the performance of the lookup.

It is important to note, however, that an index is an additional data structure created by the database outside of its own data, so it actually needs to occupy memory. When inserting and deleting data, the database also needs to maintain these indexes to ensure the consistency of the index and the actual data, resulting in a decrease in the performance of the database insert and delete operations.

Another thing to consider is to avoid page Split as much as possible by correctly setting the fill factor. In a common database, data is recorded in a page that has a fixed size. When we need to insert a piece of data, the available space in the target page may not be enough to add a new piece of data. The database now adds a new page and points the data from one page to the two pages. In this process, the database not only to add and modify the data page itself, but also need to make changes to IAM and other pages, it is a more resource-intensive operation. FILLFACTOR is a global setting that controls the percentage of pages that are populated per page when leaf pages are created. On top of the FILLFACTOR setting, the user can also set the PAD_INDEX option to control non-leaf pages and also use FILLFACTOR to control the filling of the data. A higher fillfactor will make the data more concentrated, thus having higher read performance. A lower fillfactor is more friendly to writes because it prevents page Split.

In addition to the various methods described above, you can improve performance through a range of other database features. The most important of these is of course the execution plan provided by each database (execution plan). By executing the plan, you can see how the request you are executing is being executed by the database:

Since how to improve the performance of a single database is a huge topic, and our article focuses on how to improve scalability, we are not here to discuss how to improve the performance of database execution.

In turn, because the performance of a single server is limited, we cannot scale out the relational database indefinitely. Therefore, under the necessary conditions, we need to consider the scale-out of the relational database. With the AKF scale-out model implemented on the relational database, the meanings of each axis are as follows:

Now come with me to see what each axis means. In the AKF model, the x-axis represents an issue where an application can address extensibility by deploying more service instances. Because a relational database manages the reading and writing of data and ensures consistency of data, scaling on the x-axis will not simply solve the problem by deploying additional DB instances. When x-axis scaling is performed, these DB instances often have different responsibilities and make up a specific topology. This is the replication of the database.

The y-axis and z-axis in the database AKF model are easier to understand than the x-axis. The y-axis in the AKF model represents the partitioning of all work based on the type of data or business logic, while the z-axis indicates that the user's request is divided according to certain characteristics of the user. Both of these partitions actually divide the data in the database into multiple DB instances, so they correspond to the partition of the database.

Let's first look at the replication of the database. Simply put, the replication of a database represents the storage of data in multiple DB instances. Read requests can be executed on any DB instance, and once data is updated on a DB instance, those updates are automatically replicated to other DB instances. In the process of data replication, the data source is called Master, and the target instance is called slave. These two roles are not mutually exclusive: In some more complex topologies, a DB instance may be both master and slave.

In the replication of relational database, the most common topological model is the simple Master-slave model. In this model, the reading of the data can be done on any database instance. When data needs to be updated, the data will only be written to a specific DB instance. At this point, changes to these data will be passed from master to slave in a single direction:

In this model, the work of data reading is handled by both master and slave. So in, the read load for each database will be about half the original. However, when writing, both master and slave need to perform a write operation, so the write load for each DB instance is not reduced. If the read load increases gradually, we can also add more slave nodes to share the read load:

I believe you are now aware that the scale-out of a relational database is mainly done by adding a series of database instances to share the read load. One thing to note, however, is that this write-transfer relationship is done by a separate thread in master and slave. That is, how many slave a master has, and how many threads it needs to maintain within it to complete the update to its slave. Since it is often possible to include hundreds of slave instances in a large application, attributing these slave to the same master will cause the performance of master to drop sharply.

One solution is to convert some of these slave to the master of other slave, and organize them into a tree-like structure:

But the Master-slave model has one drawback, which is the danger of a single point of failure. Once the DB instance as master fails, the entire database system, at least the subsystem with the master node as its root, will be invalidated.

One way to solve this problem is to use the multi-master replication model. In this model, each master DB instance can synchronize data to the other master in addition to synchronizing the slave to each other:

In this case, we avoid the problem of single point failure. However, if two DB instances are updated with the same data, they will result in data conflicts. Of course, we can prevent data collisions by dividing the data into multiple subsets that are irrelevant and by each master node being responsible for updating the specific subset.

As you can see, users ' writes to the data are assigned to different DB instances based on specific criteria. These writes are then synchronized to other instances, preserving the consistency of the data. But since we can cut these data independently into subsets, why don't we try to partition the database?

Simply put, the partition of a database is to divide the data that needs to be recorded in a database into a series of subsets, and to record the data contained in these subsets by different db instances. In this way, both the reading of data and the load of writes are divided according to the database instance in which the data resides. This is how the database scales horizontally along the y-axis of the AKF extension model.

When executing the partition of the database, the original data of the database is sliced into different db instances. Each DB instance will only contain data from several tables in the original database, thus slicing access to the entire database into a different DB instance:

However, in some cases, partitioning the data in a database does not solve the problem. One of the database instances after the Shard is still likely to assume too much load. At this point we need to slice the database again. Just this time we're slicing the data rows in the database:

In this case, we first need to perform a calculation before manipulating the data to determine the database instance in which the data resides.

However, the partition of the database is not without drawbacks. The most common problem is that we cannot manipulate the data recorded in different DB instances through the same SQL statement. So before you decide to slice a database, you first need to carefully examine the relationships between the tables and verify that the tables that are split into different databases do not have too many associated operations.

All right. So far, we've explained how to create a scalable service instance, cache, and database. I believe you have a clearer understanding of how to create a highly scalable application. Of course, in the course of writing this article, I also found a series of topics that I can continue to explain, such as spring integration, and the replication of the database and partition (sharding). In some ways (such as a database), I am not an expert. But I will do my best to make a clear statement of the knowledge point one by one written in this article.

Reprint please specify the original address and marked reprint:

Commercial reprint please contact me in advance:[email protected]

Scalability of the service

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.