Mosonic: attempts to improve the distributed storage and cache of subsonic (3)

Source: Internet
Author: User


Link.

Although cache money solves the bottleneck of Data Reading performance, the problems faced by developing large website databases are far from the read pressure.

The first is capacity.

The data volume of tens of millions/is not uncommon. A single physical database server is a bottleneck Even if it simply bears the write pressure. What's more, cache money can only achieve zero read of the database under ideal conditions. Updating the cache server, adding new queries, and complex queries will also cause read pressure.

A common practice is to use sharding to store data to multiple database servers according to certain rules.

Second, changes.

Business needs are unpredictable. No matter how well the database table structure is defined at the beginning, there will always be new requirements that need to be adjusted to the table structure.

After the data volume exceeds one million, every time you make changes to the production server such as alter table/create index, it is a painful experience.

For capacity and change issues, the schema-less database design proposed by friendfeed provides a pretty good solution.

We strongly recommend that you read the original article of friendfeed.

The solution of friendfeed is roughly as follows:

  • There is only one table structure with only two columns: ID + blob/binary (max)
  • Id itself is UUID, which can easily be sharding
  • Blob can be deserialized into any structure
  • The query is implemented by creating another table. For example, the structure deserialized by the Blob column of the users table contains the int attribute of an age. You need to query select * from users where age = 18; create another table, such as user_age, which only contains two columns of ID/age. First, query the table to obtain the ID, and then query the original users table to obtain the complete data.
  • The index table can be created asynchronously, and it is related to the query when it is created. sharding can be performed based on the query conditions, as shown in the preceding age.

The friendfeed solution is rather clever. The data structure is simple and sharding is easy to implement. The write/read pressure is distributed at once.

Blob columns are used for serialization (data is even First zip and then stored, CPU is strong, disk Io is the bottleneck), so the structure can be changed at any time. You only need to ensure that the serialization algorithm is compatible with different versions.

The flexible serialization is exactly what Facebook thrift solves!

(Do you still remember to use thrift for serialization when using memcached for object cache ?)

Without considering the sharding Distribution Scheme, define classes as structures similar to the following in mosonic:

  • ID (INT)
  • Properties (BLOB)
    • User_name (varchar)
    • Age (INT)
    • ...
  • ...
This can be used directly as follows: User. fetchbyid (XXX). properties. user_name. Because thrift serialized code generation has been implemented as a subsonic template at the beginning, it is not difficult to add a layer of structure to the data here.

In the future, you need to modify the data structure, directly change the thrift definition file, and then generate the code again. The data stored in the Properties column may not be consistent with the latest structure, but thrift does not require strict matching (binaryformatter does not). It will automatically ignore the non-conforming columns; once the object is re-stored, the data will be re-serialized and complete.

======================================

The Distributed Solution of friendfeed requires that the table master be UUID, while the cache money requires that all tables must have an auto-increment ID master key.

This is actually not a conflict. Consider database_name + table_name + ID as a uuid.

The distributed index of friendfeed is similar to the vector cache in cache money.

Processing/sharding Based on query conditions.

Previously, we added a vector cache for mosonic. We already need to determine the table name/query condition to be queried. If the match exists, the query cache will be applied. If the match exists, we will see the query distributed index!

Execute select ID from users where age = 18 limit order by ID DESC

The logic becomes like this:

 

  1. Check the vector cache and return if it exists.
  2. Check distributed index table rules to obtain new database connection strings.
  3. Execute Query
  4. Write vector Cache

 

When inserting data, we only used to update the vector cache. Now we need to insert the index table one more step.

In actual operation, because the data table is inserted first, and the vector cache is synchronously updated, subsequent plug-ins will hit the cache. The index table update is essentially a backup and can be inserted asynchronously.

Thrift/cache money/Schema-less database design is actually a solution made by three different teams to solve different technical problems, but when combined into mosonic, what I feel is not a conflict, what's more is a wonderful coincidence.

In the next article, we will continue to discuss more details.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.