Multi-to-many, database horizontal splitting architecture, multi-to-many splitting
Repost the public account's article "the path to architect"
In fact, I still don't understand some points after reading it. I will keep it for a while and then digest it slowly ~~
This article will take "friend Center" as an example to introduce "many-to-many" businesses. As the data volume increases, the database performance decreases significantly and the architecture practices related to horizontal database splitting.
1. What is multi-to-Multi-relationship?
The so-called "many-to-many" model comes from the "entity-relationship" ER Model in the database design to describe the associations between entities. A student can take multiple courses, A course can be selected by multiple students. The relationship between students and course time is multi-to-many.
2. Friend Center Business Analysis
There are two main types of friend relationships,Weak friend relationshipAndStrong friend relationshipBoth types have typical Internet product applications.
The establishment of weak friend relationships does not require mutual consent from both parties:
User A follows user B and does not require user B's consent. At this time, user A and user B are weak friends. For User A, it is understood as "follow" for the moment ";
User B pays attention to user A without the consent of user A. At this time, user A and user B are also weak friends. For User A, it is understood as "fans" for the moment ";
Weibo fans are a typical weak friend relationship application.
The establishment of strong friend relationship requires mutual consent from both parties:
User A requests to add user B as A friend, and user B agrees. In this case, user A and user B are mutual friends, that is, A is A friend of B and B is A friend of;
QQ friends are a typical application of strong friend relationships.
Friend Center is a typical multi-to-many service. A user can add multiple friends or multiple friends. Its typical architecture is as follows:
3. Weak friend relationship-Metadata lite implementation
Through Business Analysis of weak friend relationships, it is easy to understand that its core metadata is:
Where:
Guanzhu table. The user records all uid followers guanzhu_uid.
Fensi table, used to record all uid fan users fensi_uid
It should be emphasized that the generation of a weak relationship will produce two records, one attention record and one fan record.
For example, user A (uid = 1) pays attention to user B (uid = 2), user A pays more attention to one user, and user B has another fan, so:
The record {1, 2} to be inserted in the guanzhu table. 1 follows 2.
The record {2, 1} to be inserted in the fensi table is 1 powdered for 2.
How can I query who a user cares about?
Answer: Create an index on the uid of guanzhu:
Select * from guanzhu where uid = 1;
The result is displayed. 1 is followed by 2.
How can I query who has a user fan?
Answer: Create an index on fensi uid:
Select * from fensi where uid = 2;
The result is as follows: 2 powders: 1.
4. Strong friend relationship-Metadata Implementation 1
Through strong friend relationship business analysis, it is easy to understand that its core metadata is:
Where:
Uid1, uid of one party in strong friend relationship
Uid2, uid of the other Party in strong friend relationship
Users with uid = 1 have added users with uid = 2, and both parties agree to add each other as friends.Strong friend relationship, should records {1, 2} be inserted in the database or records {2, 1?
Answer: Yes
To avoid ambiguity, we recommend that the value of uid1 be smaller than that of uid2.
For example, there are three users with uid = 1, 2, and 3. They are mutual friends. There may be three such records in the database.
{1, 2}
{2, 3}
{1, 3}
How can I query a user's friends?
Answer: To query all friends with uid = 2, you only need to create an index on uid1 and uid2, and then:
Select * from friend where uid1 = 2
Union
Select * from friend where uid2 = 2
The result is displayed.
Job, why not use:
Select * from friend uid1 = 2 or uid2 = 2
5. Strong friend relationship-Metadata implementation 2
A strong friend relationship is A special case of A weak friend relationship. A and B must be closely related to each other (or, they can also be said to be followers of each other), that is, they can also be achieved using A follow table and A fan table:
For example, user A (uid = 1) and user B (uid = 2) are strong friends, that is, mutual interest:
User A (uid = 1) pays attention to user B (uid = 2). User A pays more attention to one user and user B has more followers:
The record {1, 2} to be inserted in the guanzhu table
The record {2, 1} to be inserted in the fensi table
At the same time, user B (uid = 2) also pays attention to user A (uid = 1), B pays more attention to one user, and A has one more fan, so:
The record {2, 1} to be inserted in the guanzhu table
The record {1, 2} to be inserted in the fensi table
Vi. data redundancy is a common practice for achieving multi-to-Multi-link horizontal segmentation
Two types of implementation for strong friend relationships:
Friend (uid1, uid2) Table
Data redundancy guanzhu table and fensi table (hereinafter referred to as positive table T1 and inverse table T2)
When the data volume is small, there seems to be no difference, but when the data volume is large, the advantage of data redundancy is shown as follows:
In the friend table, when the data volume is large, if uid1 is used for database sharding, the query on uid2 needs to traverse multiple databases.
Positive table T1 and reverse table T2 achieve friend relationship through data redundancy. {1, 2} {} exist in two tables respectively. Therefore, both tables use uid for database sharding, you only need to perform one query to find the corresponding followers and fans without having to scan multiple databases.
Data redundancy is a common practice of Multi-to-Multi-link data splitting when the data volume is large.
7. How to implement data redundancy
The next question is how the friend center service achieves data redundancy. There are three common methods.
Method 1: Service synchronization Redundancy
As the name implies, the friend center service synchronously writes redundant data, such as the 1-4 Process:
The business party calls the service and adds data
Services insert T1 data first
Service inserts T2 data
The service returns the new data to the business side.
Advantages:
Disadvantages:
The request processing time increases (to insert, the time is doubled)
Data may still be inconsistent. For example, if the service restarts after data is written to T1 in step 2, data is not written to T2.
If the system is sensitive to processing time, a common second solution is introduced.
Method 2: asynchronous service Redundancy
The dual-write of data is no longer completed by the friend center service. The service layer sends a message asynchronously and sends it to a dedicated data replication service through the message bus to write redundant data, such as the 1-6 process:
The business party calls the service and adds data
Services insert T1 data first
The Service sends an asynchronous message to the Message bus (it can be sent quickly without waiting for a return)
The service returns the new data to the business side.
The message bus delivers messages to the Data Synchronization Center.
Data Synchronization Center inserts T2 data
Advantages:
Disadvantages:
The complexity of the system has increased. One more component (Message Bus) and one more service (dedicated data replication service) are introduced)
Because data is not necessarily inserted into T2. therefore, the data has an inconsistent time window (this window is short and eventually consistent)
When a message bus loses a message, the redundant table data is inconsistent.
If you want to remove the system coupling caused by "data redundancy", a common third solution is introduced.
Method 3: offline asynchronous Redundancy
Data dual-writing is not completed by the friend center service, but by an offline service or task, such as the 1-6 process:
The business party calls the service and adds data
Services insert T1 data first
The service returns the new data to the business side.
The data will be written into the database log.
Offline services or tasks read database logs
Insert T2 data for offline services or tasks
Advantages:
Disadvantages:
When the business line data is successfully inserted, the data is not necessarily inserted into T2. therefore, the data has an inconsistent time window (this window is short and eventually consistent)
Data Consistency depends on the reliability of offline services or tasks.
The three solutions have their own advantages and disadvantages and can be selected based on the actual situation.
Data redundancy can solve the problem of multi-to-Multi-link horizontal database segmentation, but it brings new problems,How can we ensure data consistency between positive table T1 and inverse table T2?
8. How to ensure data consistency
As we can see from the discussion in the previous section, no matter which solution, because two-step operations cannot guarantee atomicity, there is always a possibility of data inconsistency. High-throughput distributed transactions are unsolved problems in the industry, at this timeArchitecture OptimizationInstead of completely guaranteeing data consistency, we can discover inconsistencies as soon as possible and fix them.
Eventual consistency is a common practice for High-throughput Internet Business consistency.More specifically, there are three solutions to ensure data consistency.
Method 1: all the data in the offline redundant table
As shown in, an offline scanning tool is started offline to constantly compare the positive table T1 and the reverse table T2. If the data is found to be inconsistent, make compensation and repair.
Advantages:
Relatively simple with low development cost
Online Services do not need to be modified, and repair tools are decoupled from online services
Disadvantages:
Low scanning efficiency: scans a large amount of data that can be consistent
Due to the large amount of data scanned, scanning takes a long time, that is, if the data is inconsistent, the time window for inconsistency is long.
Is there an optimization method that only scans data that may be inconsistent, instead of scanning all the data at a time to improve efficiency?
Method 2: offline incremental data scanning
Scanning Incremental log data at a time can greatly improve efficiency and shorten the time window for data inconsistency, as shown in the 1-4 Process:
Write to positive table T1
After step 1 is successful, write log log1
Write back table T2
After step 2 is successful, log log2 is written.
Of course, we still need an offline scanning tool to constantly compare log log1 and log log2. If the data is found to be inconsistent, make compensation and repair.
Advantages:
Although it is more complex than the method, it is still relatively simple
High data scanning efficiency: only incremental data is scanned
Disadvantages:
Slightly modified online services (two more logs are written at a low cost)
Although it is more real-time than method 1, the timeliness is still not high. Inconsistent Windows depend on the scan cycle.
Is there any way to detect and fix consistency in real time?
Method 3: real-time online "message pair" Detection
This time, instead of writing logs, it sends messages to the Message bus, as shown in the 1-4 Process:
Write to positive table T1
After step 1 is successful, send the message msg1
Write back table T2
After step 2 is successful, send the message msg2
This is not an offline tool that requires periodic scanning. Instead, a service that subscribes to messages in real time keeps receiving messages.
Under normal circumstances, the receiving time of msg1 and msg2 should be less than 3 S. If the detection service does not receive msg2 after receiving msg1, it will try to check the data consistency and compensate for the repair in case of inconsistency.
Advantages:
Disadvantages:
However, the technical solution itself is an input-output ratio compromise, which can be used based on the degree of business needs for consistency.
IX. Summary
There are many texts, so remember the following points as much as possible:
Friend business is a typical multi-to-many relationshipWhich can be divided into strong and weak friends.
Data redundancyIt is a common practice of Multi-to-Multi-Service Data horizontal segmentation.
There are three common solutions for redundant data
(1) Service synchronization Redundancy
(2) asynchronous service Redundancy
(3) offline asynchronous Redundancy
Data redundancy may cause consistency problems.For high-throughput Internet businesses, it is difficult to fully ensure transaction consistency,A common practice is eventual consistency.
The common practice of final consistency is to locate inconsistencies and repair data as soon as possible. There are three common solutions:
(1) Full offline Scanning
(2) offline incremental Scanning
(3) online real-time detection