Large-scale high-concurrency high-load Web Application System Architecture-Database schema strategy

Source: Internet
Author: User

In the process of expanding the scale of the Web site from small to large, the database's access pressure is also increasing, the database architecture also needs to be dynamically expanded, the database expansion process basically contains the following steps, each extension can be compared to the previous step of the performance of the deployment method to get an order of magnitude improvement.

1. Web applications and databases are deployed on the same server

General small-scale web site in this way, the user volume, data volume, concurrent access is relatively small, or a single server can not withstand, and in the face of performance bottlenecks, the cost of upgrading hardware is very expensive, when the traffic increases, applications and databases to seize limited system resources, You will soon encounter performance problems.

2. Web applications and databases are deployed on their own separate servers

Web applications and databases are deployed separately, and the Web application server and database server do their part, and the application server and database server can be upgraded separately when the system traffic increases, which is typical deployment of small-scale web sites in general. In the case of performance optimizations for applications and the use of database object caching policies, it is possible to host a large amount of traffic, such as 2000 users, 200 concurrent, and millions of data.

3. The database server is deployed in a clustered manner (such as a case of multiple instances of a database in Oracle)

The database cluster mode can bear the load is relatively large, the database physical media is a disk array, multiple DB instances with virtual IP to the external application server to provide database connectivity services. This kind of deployment basically can satisfy most common web application, but still can't satisfy the application of large user, high load, database read and write access very frequently.

4, the database adopts the master-slave deployment method

In the user-oriented blog, talk about, friends, CMS and other systems, there are millions of users, there are tens of thousands of data, there are many database query operations, there are more database write operations, and in most cases are read operations much larger than the write operation. At this time, if the database read and write operations can be separated, for the system is a great improvement. The master-slave deployment of the database came to us.

Master-slave replication:

Almost all major databases support replication, which is the basic means of simple database expansion. The following is an example of MySQL to illustrate that it supports master-slave replication, configuration is not complex, only need to open the primary server on the binary log and on the primary server and from the server separately for simple configuration and authorization. The master-slave copy of MySQL is a binary log file of the primary server, and the operations recorded in the primary server log are replayed from the server for replication, so the primary server must turn on the binary log and automatically record all updates to the primary database. Replication of the data is done by replaying the binary log files from the server to the primary server. Master-slave replication is also used for automatic backups.

Read/write Separation:

To ensure consistency of database data, we require that all updates to the database be made against the primary database, but the read operation can be done against the database. Most sites have more intensive database reads than write operations, and query conditions are relatively complex, and most of the database performance is consumed by query operations.
The master-slave replication data is completed asynchronously, which leads to a certain delay in the data in the master-slave database, which must be considered in the design of read/write separation. In the case of blogs, the user logs in and publishes an article, and he needs to see his article right away, but for other users it can be delayed for a period of time (1 minutes/5 minutes/30 minutes) without causing any problems. At this point, the current user needs to read the primary database, and for other external users with greater access, they can read from the database.

Database Reverse Proxy:

When using the master-slave deployment mode database in the way of read/write separation, a primary database corresponds to multiple slave servers, and the number of databases is unique for write operations against the primary database, but for read operations from the server it is necessary to use the appropriate algorithm to allocate the request. In particular, when multiple configurations from the server are different, even read operations are assigned by weight.
You can use the database orientation proxy for these issues. Just like a web-oriented proxy server, MySQL proxy can also be modified before the SQL statement is forwarded to the backend MySQL server.

5, Database Vertical segmentation

In the master-slave deployment database, when the write operation accounted for the main database CPU consumption of more than 50% of the time, we increase the value from the server is not very large, because all the write operations from the server will also account for more than 50% of CPU consumption, a server from the resources provided by the query is very limited. The database needs to be re-architected, we need to adopt the database vertical partitioning technology.
The simplest vertical partitioning method is to separate the original database (the separated part from the other part does not need the Join Connection query operation), such as the Web site blog and forum, is relatively independent, and its data is not very strong correlation, At this point, the original database can be split into a blog library, a forum library, as well as the remaining tables composed of libraries. The three libraries are then deployed in a master-slave database, so the pressure on the entire database is shared.
In addition, query extensibility is one of the most important reasons to use database partitioning. Splitting a large database into smaller databases can improve the performance of queries because each database partition has a small subset of its own data. Suppose you want to scan 100 million records, for a single partition of the database, the scan operation requires the database Manager to scan 100 million records independently, if you make the database system 50 partitions, and the average allocation of these 100 million records on the 50 partitions, Then the database manager for each database partition will scan only 2 million records.

6. Database Horizontal Segmentation

After the vertical partition of the database, if our blog library again can not bear the writing operation, we should do? Database vertical Partitioning There's nothing we can do about this extension, we need horizontal partitioning.
Horizontal partitioning means that we can separate the records in the same database table from the specific algorithms that are stored in separate database tables so that they can be deployed on different database servers. A lot of large-scale sites are basically the master-slave replication + vertical partition + horizontal partitioning such architecture. Horizontal partitioning does not depend on any particular technology, it is purely a logical village plan and requires a breakdown of experience and business.
How to partition it? For large Web sites, partitions must be partitioned, and we have no choice for partitioning, and we have to partition the hot spot data that frequently accesses the site that is causing the crash.
When partitioning the data, we must have a partitioned index field, such as user_id, which must be related to all records, is the primary key of the core table in the partitioned database, is the foreign key in the other table, and when the primary key is used, the primary key cannot be self-growing and must be a business primary key.

Remainder partition:

We can deposit the value after user_id%10 to a different partition database, the algorithm is simple and efficient, but when the number of partitioned databases is changed, the data of the whole system needs to be re-distributed.

Range Partitioning:

We can partition the scope of the user_id, for example, 1-100000 is a partition database, 100001-200000 is a partition database, the algorithm is very advantageous to expand when the number of partitioned database is changed, but it is easy to cause different pressure between different partitions. , such as the high pressure of the partitioned database where the old user resides, but the pressure on the new user's partitioned database is small.

Mapping Relationship partitions:

A partition mapping relationship is created for each possible result of the partitioned index field, which is very large and needs to be written to the database. For example, when an application needs to know the content of a user_id 10 user's blog in that partition, it must query the database for the answer, and of course we can use caching to improve performance.
This method preserves the partition correspondence of each record in detail, so each partition has very strong scalability, can be controlled flexibly, and it is simple to migrate the database from one partition to another, as well as to allow each partition to maintain the pressure distribution balance through flexible dynamic adjustment.

Large-scale high-concurrency high-load Web Application System Architecture-Database schema strategy

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.