Business System Design considerations (I) Distributed Data Storage Design

Last Update:2018-12-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently, the company launched a nationwide Business System, which covers all business processes. Users include 32 subsidiaries nationwide, it can be said that it is a huge system that puts all the eggs in the same blue, how hard it is to go online, just to say that some problems after the launch bring me some inspiration for business system design.

I. Distributed Data storage design should be considered

Production Line Systems in enterprises have insufficient performance considerations at the beginning. In the design, a single database is basically used to support the business. As the business scale expands, the database performance requirements increase, enterprises will purchase more expensive hardware and software products to support greater access pressure. When it cannot be mitigated, they will consider using partitioning and other methods on the database to distribute the hardware pressure, however, the database itself is still a single logical instance.

Because all data services are concentrated on a few key hardware, there will be several problems. First, the high-performance hardware costs are too high, second, all the eggs are placed in the same basket. The disaster caused by service failures is too huge. Third, hardware upgrades cannot keep up with the increase in data volume and traffic volumes, and the entire system is always slow.

To solve this problem, we can adopt the first design concept, which is to divide the system into several systems by business field, and integrate services, data, and processes between each system. This idea has been widely used, and we will not talk about its advantages and disadvantages. This article will discuss another bold and radical idea: distributed customer data.

Social websites such as Facebook, twitter, and Weibo face a lot more traffic than enterprise applications. Compared with enterprise applications, social networking sites also need to consider "What should a large number of users suddenly come ?" "What should I do if a common user suddenly gets angry ?" Handling of such unexpected performance requirements.

Their solution is to support distributed storage in the software logic, that is, to create an unlimited number of databases. data is stored in different database hardware according to certain policies, in this way, when accessing different data, you can determine which database to search for data according to the policy. Because the database is independent, you only need to add some low-end hardware to achieve smooth performance upgrade.

In order to be different from the technology of physically distributed table partitions in a logical set, I use the word "dispersed" to represent most of the logical storage solutions.

What are the policies for data dispersion? There are three main types:

First, it is divided by business (product) fields. This method is actually no different from that divided into multiple subsystems, but the main business is generally the largest data volume and the most visited. In this way, the main problem may still not be solved.

Second, it is divided by business time. This method has an effect on some recent businesses, such as stock trading, but it is not practical for businesses that rely on long-term information. For example, in the long-term insurance business, the longer the policy takes effect, on the contrary, claims and payments are more likely to occur, and all the information of the business objects must be processed during these business processes. If the data is stored in a time-based manner, the complexity of software systems increases logically.

Third, it is divided by the "Community" of the main business objects. For example, if you want to group A customer, all the relevant business information of a customer (such as his blog posts and related comments) cannot be dispersed, and the customer is placed in different physical data sources by group. In this way, when the system accesses the information of a customer, the physical database where the customer is located can be found by the community, and the subsequent logic is similar to that of the centralized database.

How can we know the customer community? For systems that do not experience sudden access growth, you can temporarily adopt an average grouping scheme and use a certain agreed algorithm to obtain the customer group number, such as the first few digits of the customer ID number. However, this type of classification is not easy to change if it is fixed. For example, if the first digit of the ID card number is used at the beginning, it is equally divided into 10 databases, but if the business increases to 10 databases, it is very difficult to modify the grouping.

Another way is to use a central database to record the user's community, in this central database, only one table and at least three fields are required: User ID, Community number, physical data source address, and accessible status. When the system starts to process a user's business, it needs to ask the database. After obtaining the group number, it can switch to the data source for business processing.

This method has great advantages. For example, on social networks such as Weibo, there may often be people who suddenly get angry. The previous day, there were still unknown people, more than one hundred fans, and two Weibo posts every day, it is obscure to be put in mysql on a blade machine like other 0.5 million people. Who knows that he may be lucky (or bad) the next day to catch up with a hot spot, suddenly, hundreds of thousands of people came to follow him, and the machine immediately got stuck. In this case, we will use the data migration program to move all the information about this person to another minicomputer and run it. After the migration is complete, we will modify the central database, and the pressure will be adjusted immediately.

This kind of migration can even be made automatic, and it will be automatically started when the user gets angry, and moved to the idle computing resources according to the policy.

In such a system architecture, the most stressful part is the central database. But first, there is no business data in this database. The data structure is very simple and it is very easy to optimize indexing and performance, that is, to buy hardware, the required investment is also limited. Second, the user grouping policy can combine fixed rules with special specified grouping rules. The central database can only store records of specific groups. If no records exist, the policy is based on fixed rules (the first digit of the ID number ?) . Again, the application server can also cache the accessed user group information, you can access the central database only when the data source fails to be accessed after the data source is asserted according to the fixed rules, and the data is not cached or the data fails to be accessed by the cache. This gives you the latest community location.

It is undeniable that this mode will cause technical difficulties for global data access requirements. For example, when we need to collect data from the entire system, we need to coordinate all databases for separate operations, finally, the results will be summarized, which must be supported by some new development and O & M architectures. However, for systems with massive data volumes, the absolute accuracy of the statistical results is already required for the system to run properly.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Business System Design considerations (I) Distributed Data Storage Design

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Business System Design considerations (I) Distributed Data Storage Design

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support