I was responsible for a project (China Telecom BDC Project), the purchase of database hardware is P590 small units. After the pressure test system on-line, the rapid development of business. The memory of the small machine, CPU long-term hovering up and down 98%. Although the hardware is good, but also can't carry the business of the hurricane, application server scale-out is relatively easy, and the database upgrade is quite expensive.
What to do? Of course, the first is a bunch of parameters for tuning and system tuning. But the decline of the indicator is not particularly desirable;
What to do? Make a reasonable split of the system.
The split of the database is divided into vertical and horizontal split, and the vertical split is relatively small. Education should be picked up from the doll, and the optimization should be grabbed from the simple.
Vertical segmentation can also be referred to as longitudinal slicing. My personal understanding of vertical segmentation is divided into two types, one is to separate the tables required by different modules, different modules into different databases, and the other is to put some very large tables, the actual commonly used fields, infrequently used fields into two tables or more than one table.
The database can also be thought of as a large chunk of a chunk of "data blocks" (tables), perpendicular to the "data Block" cut, and then spread them across multiple database hosts. Such a segmentation method is vertical data segmentation.
Slicing database modules
Our application system consists of a number of functional modules, and each function module requires data that corresponds to one or more tables in the database. The more uniform and less the interaction point between each function module, the lower the coupling degree of the system, the better the maintainability and expansibility of each module. Such a system, it is easier to achieve vertical segmentation of data.
Before vertical slicing, we have to comb the function module clearly, the lower the coupling degree, the more easily the rule definition of data vertical segmentation is. Can completely according to the function module to divide the data, the different function module's data holds in the different database host, can easily avoid the cross-database thing to exist, at the same time the system architecture is also very clear.
But in practice, if the system was designed as a system at the time, then there must be some correlation between the data, otherwise how to form a system. However, in telecommunications, the dblink between databases is not allowed, so in certain cases, using specialized interfaces or having applications read multiple databases and then process the data is an inevitable choice.
However, the size of the data itself, such as the number of main telephone users reached 120 million, the single-table query requires a very large resource costs, this time vertical segmentation does not have so much effect, but the rice to eat a mouthful, or the vertical segmentation of the problem solved it first.
Let's analyze it first and then design a segmentation rule to do a vertical split.
At first glance, no module can be separated from other modules, there is a relationship between the module and the module, could not be segmented?
We classify the main functional modules of the system first, so that the problem is much clearer.
The system functions can be divided into 4 functional modules: Telephone user table, address library, external service library and data acquisition and processing library (editing dimension), respectively, corresponding to the following tables:
1. Telephone User table: Area,user,tel,...
2. Address library: Adress, ...
3. External Service library: Out_search,out_user,out_tel,...
4. Data acquisition and processing library (editorial dimension): Orders ...
With a little more in-depth analysis, you can see that although the tables used by each module are related, the correlation is clear.
Among them, the data acquisition and processing library (editing dimension) and other modules do not have a direct relationship, then you can consider directly with a new database to undertake this block of services. And the number of users of this piece is quite large, and the use of the other three modules are relatively close to the selection of a target, then split it.
One of the features of our business is that we have a close relationship with the region and all modules need to use area information.
There are several ways to share regional information:
Dblink is the simplest, and can share a piece of data, but does not meet the company's management requirements;
Unified interface, interface development is easy, call is not difficult, but the transformation of the system, there are certain changes, and this interface performance requirements are not low, not an optimal choice;
The final choice is to place an area table between each of the divided libraries, since the administrative divisions in China are not changing very often. At the same time, interfaces are made between these area tables, and data is updated synchronously if the data is changed.
After splitting, P590 small unit is: Telephone user table, address library, external service library three groups;
Have purchased a P570 small machine deployment: Data acquisition and processing library (editing dimension)
After cutting out, the data acquisition and processing library (editing dimension) has become a separate project, in the later time, it has grown to become a large platform, and has a loud name: Unified editing.
And we, in the vertical segmentation of the road, took the first step.
Advantages of vertical slicing:
The splitting of the database is simple and clear, and the splitting rule is explicit;
The application module is clear and easy to integrate.
Disadvantages of vertical slicing:
The actual input (hardware and software) of the matching will increase;
Some table associations cannot be completed at the database level, and will be completed in the program;
Unable to handle the performance bottleneck of the table with large access and data volume;
Transaction processing is relatively complex;
When the segmentation reaches a certain extent, the extensibility will be limited;
Over-slicing can lead to systems that are too complex and difficult to maintain.
Shard Database Mega Table
In the early database design, because of various reasons, there are some large tables in the system. These tables not only have a particularly great number of rows of data, but are also particularly numerous.
For example, our old GPS data processing table, this table has more than 220 fields. The amount of data is more than 60 million. This kind of table, if the SQL that someone writes is a full table scan, then the whole database instance will be dragged down.
is a normal data write, creating an index is also a significant overhead.
What to do, in fact, vertical segmentation is not the fundamental way to solve the problem, but this is relatively simple, before making a major decision, always start from the simple part, and run the same as the first warm-up activities, and then start to run, so as to run better, farther.
This table is very old, has neglected his existence, until one day, a colleague to this table did a little operation, the small unit almost hung up. This problem has to be solved.
Step by step, first of this table of 220 fields to analyze, from the attributes, mainly user information, address information, GPS information, auxiliary extension information (QQ number, MSN number, etc.), operation log and so on.
Simple segmentation, user information placed on the user table, with the User ID Association; address information and GPS reservation, auxiliary extended properties are basically empty, put in one-to-one extension table, operation log put log table.
So down, the GPS data processing table of the main table is only 30 fields, although a bit more, but much better than before, the remaining problem is more than 60 million of the data, that requires other processing.
Database optimization-Vertical segmentation and application in real projects