Multi-to-many, database horizontal splitting architecture, multi-to-many splitting

Last Update:2017-08-04 Source: Internet

Author: User

Tags database sharding

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Multi-to-many, database horizontal splitting architecture, multi-to-many splitting

Repost the public account's article "the path to architect"

In fact, I still don't understand some points after reading it. I will keep it for a while and then digest it slowly ~~

This article will take "friend Center" as an example to introduce "many-to-many" businesses. As the data volume increases, the database performance decreases significantly and the architecture practices related to horizontal database splitting.

1. What is multi-to-Multi-relationship?

The so-called "many-to-many" model comes from the "entity-relationship" ER Model in the database design to describe the associations between entities. A student can take multiple courses, A course can be selected by multiple students. The relationship between students and course time is multi-to-many.

2. Friend Center Business Analysis

There are two main types of friend relationships,Weak friend relationshipAndStrong friend relationshipBoth types have typical Internet product applications.

The establishment of weak friend relationships does not require mutual consent from both parties:

User A follows user B and does not require user B's consent. At this time, user A and user B are weak friends. For User A, it is understood as "follow" for the moment ";
User B pays attention to user A without the consent of user A. At this time, user A and user B are also weak friends. For User A, it is understood as "fans" for the moment ";

Weibo fans are a typical weak friend relationship application.

The establishment of strong friend relationship requires mutual consent from both parties:

User A requests to add user B as A friend, and user B agrees. In this case, user A and user B are mutual friends, that is, A is A friend of B and B is A friend of;

QQ friends are a typical application of strong friend relationships.

Friend Center is a typical multi-to-many service. A user can add multiple friends or multiple friends. Its typical architecture is as follows:

Friend-service: A friend center service that provides friendly RPC interfaces for callers.
Db: stores friend Data

3. Weak friend relationship-Metadata lite implementation

Through Business Analysis of weak friend relationships, it is easy to understand that its core metadata is:

Guanzhu (uid, guanzhu_uid );
Fensi (uid, fensi_uid );

Where:

Guanzhu table. The user records all uid followers guanzhu_uid.
Fensi table, used to record all uid fan users fensi_uid

It should be emphasized that the generation of a weak relationship will produce two records, one attention record and one fan record.

For example, user A (uid = 1) pays attention to user B (uid = 2), user A pays more attention to one user, and user B has another fan, so:

The record {1, 2} to be inserted in the guanzhu table. 1 follows 2.
The record {2, 1} to be inserted in the fensi table is 1 powdered for 2.

How can I query who a user cares about?

Answer: Create an index on the uid of guanzhu:

Select * from guanzhu where uid = 1;

The result is displayed. 1 is followed by 2.

How can I query who has a user fan?

Answer: Create an index on fensi uid:

Select * from fensi where uid = 2;

The result is as follows: 2 powders: 1.

4. Strong friend relationship-Metadata Implementation 1

Through strong friend relationship business analysis, it is easy to understand that its core metadata is:

Friend (uid1, uid2 );

Where:

Uid1, uid of one party in strong friend relationship
Uid2, uid of the other Party in strong friend relationship

Users with uid = 1 have added users with uid = 2, and both parties agree to add each other as friends.Strong friend relationship, should records {1, 2} be inserted in the database or records {2, 1?

Answer: Yes

To avoid ambiguity, we recommend that the value of uid1 be smaller than that of uid2.

For example, there are three users with uid = 1, 2, and 3. They are mutual friends. There may be three such records in the database.

{1, 2}

{2, 3}

{1, 3}

How can I query a user's friends?

Answer: To query all friends with uid = 2, you only need to create an index on uid1 and uid2, and then:

Select * from friend where uid1 = 2

Union

Select * from friend where uid2 = 2

The result is displayed.

Job, why not use:

Select * from friend uid1 = 2 or uid2 = 2

5. Strong friend relationship-Metadata implementation 2

A strong friend relationship is A special case of A weak friend relationship. A and B must be closely related to each other (or, they can also be said to be followers of each other), that is, they can also be achieved using A follow table and A fan table:

Guanzhu (uid, guanzhu_uid );
Fensi (uid, fensi_uid );

For example, user A (uid = 1) and user B (uid = 2) are strong friends, that is, mutual interest:

User A (uid = 1) pays attention to user B (uid = 2). User A pays more attention to one user and user B has more followers:

The record {1, 2} to be inserted in the guanzhu table
The record {2, 1} to be inserted in the fensi table

At the same time, user B (uid = 2) also pays attention to user A (uid = 1), B pays more attention to one user, and A has one more fan, so:

The record {2, 1} to be inserted in the guanzhu table
The record {1, 2} to be inserted in the fensi table

Vi. data redundancy is a common practice for achieving multi-to-Multi-link horizontal segmentation

Two types of implementation for strong friend relationships:

Friend (uid1, uid2) Table
Data redundancy guanzhu table and fensi table (hereinafter referred to as positive table T1 and inverse table T2)

When the data volume is small, there seems to be no difference, but when the data volume is large, the advantage of data redundancy is shown as follows:

In the friend table, when the data volume is large, if uid1 is used for database sharding, the query on uid2 needs to traverse multiple databases.
Positive table T1 and reverse table T2 achieve friend relationship through data redundancy. {1, 2} {} exist in two tables respectively. Therefore, both tables use uid for database sharding, you only need to perform one query to find the corresponding followers and fans without having to scan multiple databases.

Data redundancy is a common practice of Multi-to-Multi-link data splitting when the data volume is large.

7. How to implement data redundancy

The next question is how the friend center service achieves data redundancy. There are three common methods.

Method 1: Service synchronization Redundancy

As the name implies, the friend center service synchronously writes redundant data, such as the 1-4 Process:

The business party calls the service and adds data
Services insert T1 data first
Service inserts T2 data
The service returns the new data to the business side.

Advantages:

Not complex. The service layer is changed from one write to two write operations.
Relatively high data consistency (returned only when dual-write is successful)

Disadvantages:

The request processing time increases (to insert, the time is doubled)
Data may still be inconsistent. For example, if the service restarts after data is written to T1 in step 2, data is not written to T2.

If the system is sensitive to processing time, a common second solution is introduced.

Method 2: asynchronous service Redundancy

The dual-write of data is no longer completed by the friend center service. The service layer sends a message asynchronously and sends it to a dedicated data replication service through the message bus to write redundant data, such as the 1-6 process:

The business party calls the service and adds data
Services insert T1 data first
The Service sends an asynchronous message to the Message bus (it can be sent quickly without waiting for a return)
The service returns the new data to the business side.
The message bus delivers messages to the Data Synchronization Center.
Data Synchronization Center inserts T2 data

Advantages:

Short request processing time (insert only once)

Disadvantages:

The complexity of the system has increased. One more component (Message Bus) and one more service (dedicated data replication service) are introduced)
Because data is not necessarily inserted into T2. therefore, the data has an inconsistent time window (this window is short and eventually consistent)
When a message bus loses a message, the redundant table data is inconsistent.

If you want to remove the system coupling caused by "data redundancy", a common third solution is introduced.

Method 3: offline asynchronous Redundancy

Data dual-writing is not completed by the friend center service, but by an offline service or task, such as the 1-6 process:

The business party calls the service and adds data
Services insert T1 data first
The service returns the new data to the business side.
The data will be written into the database log.
Offline services or tasks read database logs
Insert T2 data for offline services or tasks

Advantages:

Data dual-write is completely decoupled from business
Short request processing time (insert only once)

Disadvantages:

When the business line data is successfully inserted, the data is not necessarily inserted into T2. therefore, the data has an inconsistent time window (this window is short and eventually consistent)
Data Consistency depends on the reliability of offline services or tasks.

The three solutions have their own advantages and disadvantages and can be selected based on the actual situation.

Data redundancy can solve the problem of multi-to-Multi-link horizontal database segmentation, but it brings new problems,How can we ensure data consistency between positive table T1 and inverse table T2?

8. How to ensure data consistency

As we can see from the discussion in the previous section, no matter which solution, because two-step operations cannot guarantee atomicity, there is always a possibility of data inconsistency. High-throughput distributed transactions are unsolved problems in the industry, at this timeArchitecture OptimizationInstead of completely guaranteeing data consistency, we can discover inconsistencies as soon as possible and fix them.

Eventual consistency is a common practice for High-throughput Internet Business consistency.More specifically, there are three solutions to ensure data consistency.

Method 1: all the data in the offline redundant table

As shown in, an offline scanning tool is started offline to constantly compare the positive table T1 and the reverse table T2. If the data is found to be inconsistent, make compensation and repair.

Advantages:

Relatively simple with low development cost
Online Services do not need to be modified, and repair tools are decoupled from online services

Disadvantages:

Low scanning efficiency: scans a large amount of data that can be consistent
Due to the large amount of data scanned, scanning takes a long time, that is, if the data is inconsistent, the time window for inconsistency is long.

Is there an optimization method that only scans data that may be inconsistent, instead of scanning all the data at a time to improve efficiency?

Method 2: offline incremental data scanning

Scanning Incremental log data at a time can greatly improve efficiency and shorten the time window for data inconsistency, as shown in the 1-4 Process:

Write to positive table T1
After step 1 is successful, write log log1
Write back table T2
After step 2 is successful, log log2 is written.

Of course, we still need an offline scanning tool to constantly compare log log1 and log log2. If the data is found to be inconsistent, make compensation and repair.

Advantages:

Although it is more complex than the method, it is still relatively simple
High data scanning efficiency: only incremental data is scanned

Disadvantages:

Slightly modified online services (two more logs are written at a low cost)
Although it is more real-time than method 1, the timeliness is still not high. Inconsistent Windows depend on the scan cycle.

Is there any way to detect and fix consistency in real time?

Method 3: real-time online "message pair" Detection

This time, instead of writing logs, it sends messages to the Message bus, as shown in the 1-4 Process:

Write to positive table T1
After step 1 is successful, send the message msg1
Write back table T2
After step 2 is successful, send the message msg2

This is not an offline tool that requires periodic scanning. Instead, a service that subscribes to messages in real time keeps receiving messages.

Under normal circumstances, the receiving time of msg1 and msg2 should be less than 3 S. If the detection service does not receive msg2 after receiving msg1, it will try to check the data consistency and compensate for the repair in case of inconsistency.

Advantages:

High Efficiency
High real-time performance

Disadvantages:

The solution is complicated. The message bus component is introduced online.
An offline inspection service with a subscription bus added

However, the technical solution itself is an input-output ratio compromise, which can be used based on the degree of business needs for consistency.

IX. Summary

There are many texts, so remember the following points as much as possible:

Friend business is a typical multi-to-many relationshipWhich can be divided into strong and weak friends.
Data redundancyIt is a common practice of Multi-to-Multi-Service Data horizontal segmentation.
There are three common solutions for redundant data
(1) Service synchronization Redundancy
(2) asynchronous service Redundancy
(3) offline asynchronous Redundancy
Data redundancy may cause consistency problems.For high-throughput Internet businesses, it is difficult to fully ensure transaction consistency,A common practice is eventual consistency.
The common practice of final consistency is to locate inconsistencies and repair data as soon as possible. There are three common solutions:
(1) Full offline Scanning
(2) offline incremental Scanning
(3) online real-time detection

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More