Oracle RACVs SQL Server article 6: data dependent routing (also known as"Data splittingSolution ")
In the previousArticleWe have already talked a lot about SQL Server horizontal scaling. Today we will look at the last solution. In fact, there are many SQL Server scaling solutions, this series of articles only introduces several of them. In fact, many times I prefer to call these solutions"DatabaseThe horizontal scaling mode is similar to many concepts in our programming world.
If the data dependent routing (hereinafter referred to as DDR) is used, the data is split, put in different databases, and then appliedProgramRoute data operations to the correct database through related logic or middleware. After using DDR, its transparency to the application is obviously diminished because the application needs to know where the data it operates.
DDR is not directly supported by SQL Server, but can be implemented through some design. In addition, this method has been around for a long time, and is widely used in many large websites and applications. It is more common. This method is often related to what we call "split by business.
Next, let's take a simple example,
20120917102651. PNG (21.83 K) 10:34:14 AM
As you can see in this figure, you are basically aware of it. This split uses a lot of resources. After splitting, the query is distributed across different database servers. Data Update operations are also scattered. In this way, the performance can be improved to a great extent, but the complexity of data management is brought about: When querying, you need to know which database or databases to query the corresponding data; when performing updates and other operations, you also need to operate which database or which database.
In the preceding example, the customer ID is used as the separator key, andAlgorithmDistribute data of different IDs in different databases. Considering the use of the customer table in the application and related services, other information related to the customer, such as orders, is also distributed in the same database, so that cross-database is avoided, cross-network query.
We call the information related to the customer as the object set information of the customer. Therefore, when splitting data, you must note that the entity set information of the customer and the customer is distributed together, in this way, each data segment is a small complete body independently contained. As the number of data increases, the different data segments are separated by small and complete bodies.
Through the above method, we have split the data according to the customer ID. The next step is the key to the problem:How does an application know which one?IDCustomerWhere is the related information??
The above question is more common: for a customer, how do we know the server on which the data is stored.
One of the easiest and most commonly used methods is:Create a ing table. This is required.
There are many ways to create a ing table. Here I just mention a few:
1. if the ing table is placed in a shared memory, for example, if the distributed cache is used, the data access layer of the application will go to the shared memory to find the ing table, locate the corresponding database server and perform data operations.
2. Place the ing table in a database. The database should be separate. Of course, you can also split one of the databases before. There is no rigid rule.
Of course, the above methods are not "you don't have me" and can be used together. The basic structure of the ing table is as follows:
20120917102743. PNG (20.82 K) 10:34:14 AM
There are many ways to design a ing table, as long as the ing effect is achieved, generally in the form of a dictionary. If you use the customer ing table above, you can know the data of the customer ID <10000. You need to query database server a, and so on.
DDR is mainly used in applications with many write operations. It is very effective for highly concurrent update applications. DDR requires the application's data access layer to know how to send requests to the correct database. In addition, for DDR, it is best to adopt it at the beginning of application design. If it is used in existing applications, the application will undergo great changes, because even the logic at the data access layer must be rewritten. Of course, how to have a suitable middleware can reproduce the use of DDR in some programs.
In addition, let's look at a case. This case is used in MSN.
20120917103155. PNG (25.47 K) 10:41:19 AM
In the figure:
1. Web servers runs on IIS to process requests sent by client users
2. The data split ing table is located on the LPS server and saved in SQL Server 2000.
3. The small and complete data split bodies are stored on many different database servers.
4. The MSN extension management layer is used to determine the correct database for each data operation
In the MSN case, data is split by the ID of each user. This principle is very similar to the customer mentioned earlier. I will not talk about it here. You may have a lot of experience.
More:
|
SQL Server horizontal scaling: design, implementation, and maintenance (1)-issues that must be known |
|
SQL Server horizontal scaling: design, implementation and maintenance (2)-distributed partition View |
SQL Server horizontal scaling: design, implementation and maintenance (3)-distributed partition view implementation