Database Optimization-vertical segmentation and Application in actual projects, vertical segmentation

Source: Internet
Author: User
Tags database sharding

Database Optimization-vertical segmentation and Application in actual projects, vertical segmentation

I was in charge of a project (China Telecom BDC project), and the purchased Database hardware was a small unit of P590. After the stress test, the system went online, and the business grew rapidly. The memory and CPU of a small machine have been hovering around 98% for a long time. Although the hardware is good, it cannot help the business. It is relatively easy to scale the application server horizontally, and the database upgrade is quite expensive.

What should I do? Of course, the first is a bunch of parameter optimization and system optimization. However, indicators are not ideal for downgrading;

What should I do? Split the system reasonably.

Database sharding is divided into vertical sharding and horizontal sharding. Vertical sharding is relatively less dynamic. Education should start with dolls and optimization should begin with simplicity.

Vertical splitting, also known as vertical splitting. My personal understanding of vertical splitting is divided into two types. One is to separate the tables required by different modules and put different modules into different databases; the other is to split some very large tables into two or more tables with commonly used fields and infrequently used fields.

 

You can also think of a database as composed of many "data blocks" (tables) with a large block and a large block. These "data blocks" can be cut vertically, then they are distributed to multiple database hosts. This method is used to split vertical data.

 

Split database module

Our application system consists of many functional modules, and the data required by each functional module corresponds to one or more tables in the database. The more unified and less interaction between function modules, the lower the Coupling Degree of the system, and the better the maintainability and scalability of each module. Such a system makes it easier to achieve vertical data splitting.

 

Before vertical splitting, we need to clarify the functional modules. The lower the coupling degree, the easier it is to define rules for Vertical Data splitting. Data can be split based on functional modules. Data of different functional modules is stored in different database hosts, which can easily avoid cross-database operations, the system architecture is also very clear.

 

However, in reality, if the system was designed as a system at the time, there must be some associations between the data, or how to form a system. However, in China Telecom, dblinks between databases are not allowed to be used. In specific cases, you can use special interfaces or allow applications to read multiple databases and then process data, is an inevitable choice.

However, the data scale has increased. For example, when the number of primary phone users reaches 0.12 billion, the resource overhead required for querying a single table is already very high, at this time, vertical splitting does not play so much role, but it takes a bite to eat a meal. Let's solve the vertical splitting problem first.

 

First, let's analyze it, and then design a splitting rule to perform a Vertical Split.

 

At first glance, no module exists independently from other modules, and there is a relationship between the module and the module, so it cannot be split?

We first classify the main functional modules of the system, so that the problem becomes clearer.

 

The system functions can be divided into four functional modules: Telephone User table, address database, External Service database, and data collection and processing database (editing and dimension), which correspond to the following tables respectively:

 

1. Telephone user table: area, user, tel ,...

 

2. address Library: address ,...

 

3. External Service libraries: out_search, out_user, out_tel ,...

 

4. Data collection and processing database (editing and dimension): orders...

 

 

After further analysis, we can find that although the tables used by each module are correlated, the Association is clear.

 

The data collection and processing database (editing and dimension) is not directly related to other modules, so you can consider using a new database to undertake this service. The number of users in this module is quite large, which is similar to the usage of the other three modules. If you select a target, split it.

 

Our business has a feature that is closely related to the region. All modules need to use area information.

There are several ways to share region information:

DBLink is the simplest and can share a copy of data, but does not meet the company's management requirements;

It is easy to Develop interfaces and call interfaces in a unified manner. However, there are some changes to the transformed system, and the performance requirements of this interface are not low, which is not an optimal choice;

The final selection method is to place an area table between each split database because China's administrative divisions do not change frequently. At the same time, interfaces are implemented between these area tables. If the data changes, the data will be updated synchronously.

 

After the split, the small P590 unit is three major groups: Telephone User table, address library, and external service database;

Purchased P570 mini-machine deployment: data collection and processing database (editing and maintenance)

After splitting, the data collection and processing database (editing and dimension) becomes a separate project. In the future, it will grow and become a big platform, in addition, there is a loud name: Unified editing and maintenance.

 

We have taken the first step in vertical splitting.

 

Advantages of vertical Splitting:

Database splitting is simple and clear, and sharding rules are clear;

Application modules are clear and clear, and integration is easy;

 

Disadvantages of vertical Splitting:

The actual supporting investment (hardware and software) will increase;

Some table associations cannot be completed at the database level, but must be completed in the program;

Unable to handle performance bottlenecks of tables with large access volumes and large data volumes;

Transaction processing is relatively complex;

When the splitting reaches a certain degree, the scalability will be limited;

Excessive splitting may make the system too complex and difficult to maintain.

 

Split a database extra large table

In the early database design, for various reasons, there may be some extra large tables in the system. These tables not only have a large number of data rows, but also have many fields.

For example, in our old GPS Data Processing table, this table has more than 220 fields. There are more than 60 million data records. For such tables, if the SQL statements that are accidentally written are full table scans, the entire database instance will be dragged down.

It is a common data writing, and creating an index is also a huge overhead.

What should we do? In fact, vertical splitting is not the fundamental way to solve the problem, but it is always relatively simple. Before making a major decision, we should always start from the simple part; just like running, do warm-up first before you can start running, so that you can run better and farther.

This table is very old and has ignored its existence. One day, a colleague made some operations on this table, and the small unit almost crashed. This problem must be solved.

Step by step, first analyze the 220 fields in this table. In terms of attributes, it mainly includes user information, address information, and GPS information, auxiliary extension information (QQ number, MSN number, etc.), Operation Log, etc.

In simple splitting, user information is stored in the User table and associated with the user ID. Address information and GPS are retained. The secondary extension attributes are basically empty and placed in one-to-one extension tables, operation logs are stored in the log table.

In this way, the main table of the GPS data processing table has only 30 fields. Although there are a lot of fields, it is much better than before. The remaining problem is more than 60 million of the data, other processing methods are required.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.