Split scheme of MongoDB single table billion-level data

Last Update:2017-01-13 Source: Internet

Author: User

Tags mongodb redis

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Split table is a common solution to the bottleneck of a single table database, in the actual application scenario can partially solve the single table write pressure and read pressure, but also bring some more complex impact:

Aggregating queries becomes difficult

When a split key is selected, the change can be very difficult
The process of disassembling the table to ensure that the online business is not affected, the operation of high complexity
Therefore, the table split must be selected at the appropriate time, too early, pay a great price will not bring about the performance of the promotion, too late, the volume of data large, difficult to operate.

In this implementation of the split scenario, the data features are:

Single Watch Over billion
Business data is the user's collection of resources, the structure is relatively single
Only one-way (user –> data) lookup, do not require reverse (data –> user) lookup
Data is in real time change state (add, delete, query and other operations coexist)
How to select a split key
In this case, the split key is the user ID, and each user's associated resource data is not maximum.

How do I guarantee that the process of splitting does not affect the user's actions?
In this case, the data in the table is basically only the following

User adds a new piece of data
User deletes a piece of data
User query data (with paging)
The basic principle of the table is to hash the selected key to generate the table space name of each key, that is to say, at any time, the data of any user can only be three of the following States

1. All in the old table
2. Part in the new table, partly in the old table
3. All in the new table

The specific work of data splitting is to be done off-line, in order to ensure that the user data in such three states still have the same consistency as a single table, requires the business layer in the processing of current user data to determine whether the user is in the migration, only in the migration State, user data needs special processing.

How is the migration state judged?
If the current user data is in a migrated state, in order to ensure that the user data is all available, that is, to do without paging, here is a special treatment: once the migration, the user's entire data will be loaded into a special area (here we use the Redis, and then discuss the possible problems with this process), and is saved as an ordered set of data.

Therefore, if the user is in a migration state, the user's data must be present in the area, and the area will be named On_progress.

How to ensure the correctness of user data in the migration state?
When the user is in a migration state, the user adds a new piece of data, and then the data is written to each of the newly added tables and on_progress, and the user deletes one data, then deletes the data in the new table and on_progress.

What is the purpose of doing so?

The user in the migration state, on_progress all of its data, can be paging, all the data in the on_progress will be in the new table in the offline state, after the migration is completed, the old table data will be deleted, on_progress data will be cleared So that the user data goes all the way to the new table, and all subsequent operations are only done on the new table.

The data in On_progress is continuously written to the new table in the same order as the new –> old or from the old –>, in the process:

User deletes a piece of data

1 It is possible that the data has been written to a new table, so the deletion will be performed in the new table.
2 The data in the on_progress should always be synchronized with all the user's operations to ensure the correctness and consistency of the data, and the deletion will also be performed in On_progress.

User adds a new piece of data

1 The data in the on_progress should always be synchronized with all the user's operations to ensure the correctness and consistency of the data, so the on_progress need to write a piece of data.
2 in some boundary conditions, on_progress (if the data from the old –> write new table) in the data migration completed but has not been deleted, the user happens to write a data, the user is still in the migration state, but the offline migration operation has been considered to have stopped the migration, As a result, new data needs to be added to the table to ensure that the data is finally correct.

The choice of on_progress
In the implementation of this scheme, On_progress used a redis ordered set, the key is the user ID, as long as the existence of the detection of the user ID that the user is in the migration state.

After the migration is complete, delete the data in the old table before deleting the ordered set in On_progress, so as long as the user is in Redis, it must be a migration or migration complete, and if the user data is found in the old table, the user must not have started the migration or the user has migrated successfully.

It should be noted that the problem with choosing Redis is that Redis downtime causes data in the migration to be unrecoverable (Redis not open for persistence or other reasons that data cannot be recovered). As long as the user has data in the old table, the user's deletion or addition of the data must also be performed in the old table. When you restore the migration again, you will not be able to do so until you clear the new table data.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More