Mosonic: attempts to improve the distributed storage and cache of subsonic (3)

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Link.

Although cache money solves the bottleneck of Data Reading performance, the problems faced by developing large website databases are far from the read pressure.

The first is capacity.

The data volume of tens of millions/is not uncommon. A single physical database server is a bottleneck Even if it simply bears the write pressure. What's more, cache money can only achieve zero read of the database under ideal conditions. Updating the cache server, adding new queries, and complex queries will also cause read pressure.

A common practice is to use sharding to store data to multiple database servers according to certain rules.

Second, changes.

Business needs are unpredictable. No matter how well the database table structure is defined at the beginning, there will always be new requirements that need to be adjusted to the table structure.

After the data volume exceeds one million, every time you make changes to the production server such as alter table/create index, it is a painful experience.

For capacity and change issues, the schema-less database design proposed by friendfeed provides a pretty good solution.

We strongly recommend that you read the original article of friendfeed.

The solution of friendfeed is roughly as follows:

There is only one table structure with only two columns: ID + blob/binary (max)
Id itself is UUID, which can easily be sharding
Blob can be deserialized into any structure
The query is implemented by creating another table. For example, the structure deserialized by the Blob column of the users table contains the int attribute of an age. You need to query select * from users where age = 18; create another table, such as user_age, which only contains two columns of ID/age. First, query the table to obtain the ID, and then query the original users table to obtain the complete data.
The index table can be created asynchronously, and it is related to the query when it is created. sharding can be performed based on the query conditions, as shown in the preceding age.

The friendfeed solution is rather clever. The data structure is simple and sharding is easy to implement. The write/read pressure is distributed at once.

Blob columns are used for serialization (data is even First zip and then stored, CPU is strong, disk Io is the bottleneck), so the structure can be changed at any time. You only need to ensure that the serialization algorithm is compatible with different versions.

The flexible serialization is exactly what Facebook thrift solves!

(Do you still remember to use thrift for serialization when using memcached for object cache ?)

Without considering the sharding Distribution Scheme, define classes as structures similar to the following in mosonic:

ID (INT)
Properties (BLOB)
- User_name (varchar)
- Age (INT)
- ...
...

This can be used directly as follows: User. fetchbyid (XXX). properties. user_name. Because thrift serialized code generation has been implemented as a subsonic template at the beginning, it is not difficult to add a layer of structure to the data here.

In the future, you need to modify the data structure, directly change the thrift definition file, and then generate the code again. The data stored in the Properties column may not be consistent with the latest structure, but thrift does not require strict matching (binaryformatter does not). It will automatically ignore the non-conforming columns; once the object is re-stored, the data will be re-serialized and complete.

======================================

The Distributed Solution of friendfeed requires that the table master be UUID, while the cache money requires that all tables must have an auto-increment ID master key.

This is actually not a conflict. Consider database_name + table_name + ID as a uuid.

The distributed index of friendfeed is similar to the vector cache in cache money.

Processing/sharding Based on query conditions.

Previously, we added a vector cache for mosonic. We already need to determine the table name/query condition to be queried. If the match exists, the query cache will be applied. If the match exists, we will see the query distributed index!

Execute select ID from users where age = 18 limit order by ID DESC

The logic becomes like this:

Check the vector cache and return if it exists.
Check distributed index table rules to obtain new database connection strings.
Execute Query
Write vector Cache

When inserting data, we only used to update the vector cache. Now we need to insert the index table one more step.

In actual operation, because the data table is inserted first, and the vector cache is synchronously updated, subsequent plug-ins will hit the cache. The index table update is essentially a backup and can be inserted asynchronously.

Thrift/cache money/Schema-less database design is actually a solution made by three different teams to solve different technical problems, but when combined into mosonic, what I feel is not a conflict, what's more is a wonderful coincidence.

In the next article, we will continue to discuss more details.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Mosonic: attempts to improve the distributed storage and cache of subsonic (3)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Mosonic: attempts to improve the distributed storage and cache of subsonic (3)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support