As Web sites grow in size from small to large, database access pressure is also increasing, the database architecture also needs to dynamically expand, the database expansion process is basically included in the following steps, each extension can be compared to the previous step of the deployment of the performance of a number of levels of promotion.
1. Web applications and databases are deployed on the same server
The general small scale site in this way, the amount of user, the amount of data, concurrent access is relatively small, or a single server can not afford, and in the face of performance bottlenecks in the upgrade of hardware costs is very high, in the increase in access, the application and database to preempt the limited system resources, You will soon encounter performance problems.
2. Web applications and databases are deployed on separate servers
Web applications and databases are deployed separately, Web application servers and database servers perform their own duties, and the application servers and database servers can be upgraded separately when system traffic increases, which is typical for small scale web sites. When you optimize your application for performance and use database object caching policies, you can host large amounts of traffic, such as 2000 users, 200 concurrent, millions other data volumes.
3. The database server is deployed in a clustered manner (for example, multiple instances of Oracle's database)
Database cluster way can bear the load is relatively large, database physical media for a disk array, multiple database instances in virtual IP mode to external application server to provide database connection services. This kind of deployment can basically satisfy most common web applications, but still can not meet the large user, high load, database read and write access very frequent applications.
4, the database uses the master-slave deployment Way
In the public users of the blog, talk about, friends, CMS and other systems, there are millions of users, there are tens of thousands of data, there are many database query operations, there are more database write operations, and in most cases are read more than write operations. At this time, if the database can be separated from the read and write operations, for the system is a great improvement. The master-slave approach to the database comes to us.
Master-slave replication:
Almost all major databases support replication, which is the basic means of simple database expansion. Take MySQL as an example to illustrate that it supports master-slave replication, configuration is not complex, only need to open the primary server binary log and on the primary server and from the server for simple configuration and authorization. MySQL master-slave replication is a primary server of the binary log files, the master server log records in the operation will be replayed from the server, so that replication, so the primary server must open the binary log, automatically record all the updates for the primary database operations, Copying the data is accomplished by replaying the binary log file from the server to the primary server. Master-slave replication is also used for automatic backups.
read-write separation:
to ensure consistency of database data, we require that all updates to the database be done against the primary database, but the read operation can be done against the database. Most of the site's database read operations are more intensive than write operations, and query conditions are relatively complex, most of the database performance consumption in the query operation. The
master-slave copy data is completed asynchronously, which leads to a certain delay in the data in the master-slave database, which must be considered in the design of read-write separation. For example, when a user logs in and publishes an article, he needs to see his article right away, but for other users it can be allowed to delay for a period of time (1 minutes/5 minutes/30 minutes) without causing any problems. The primary database is required for the current user and can be read from the database for other users with greater access.
Database Reverse proxy:
in a read-write separation of the way to use the master-slave deployment of the database, you will encounter a problem, a main database corresponding to multiple from the server, for the write operation is for the main database, the number of databases is the only, However, the read operation from the server needs to use the appropriate algorithm to allocate the request, especially for multiple from the server configuration is not the same time even need to read the weight of the allocation.
You can use the database direction broker for the above issues. Like the web direction proxy server, MySQL proxy can also modify the SQL statement before forwarding it to the MYsql server on the back end.
5, Database vertical segmentation
In the master-slave deployment database, when the write operation accounted for the main database CPU consumption of more than 50%, we increase the meaning of the server is not very large, because all the writes from the server will also account for more than 50% of the CPU consumption, a server from the query resources are very limited. The database needs to be redesigned, and we need to adopt a database vertical partitioning technique.
The simplest vertical partitioning is the separation of separate business from the original database (the split part does not require join-join query operations), such as Web site blogs and forums, which are relatively independent, and are not highly correlated with other data. The original database can then be split into a blog library, a forum library, and the remaining tables composed of the library. The three libraries are then deployed separately from the master-slave database, so the pressure on the entire database is shared.
In addition, query extensibility is one of the main reasons for using database partitioning. Splitting a large database into smaller databases can improve query performance because each database partition has a small portion of its own data. Suppose you want to scan 100 million records, for a single partitioned database, the scan requires the database Manager to scan 100 million records independently, and if you make the database system into 50 partitions, and distribute the 100 million records evenly over the 50 partitions, The database manager for each database partition will scan only 2 million records.
6. Database Horizontal partition
After the vertical partition of the database, if our blog library again can not assume the write operation, we should do. Database vertical Partitioning There is no way to extend this, we need horizontal partitioning.
Horizontal partitioning means that we can separate records from the same database table by a specific algorithm and save them in separate database tables so that they can be deployed on different database servers. Many large-scale sites are basically master-slave replication + vertical Partitioning + horizontal partitions such a framework. Horizontal partitioning does not depend on any particular technology, it is purely a logical village-level planning, and requires a breakdown of experience and business.
How to partition it. For a large web site, it must be partitioned and we have no choice for partitioning, and we must partition the Hotspot data that frequently accesses the site's proximity to collapse.
When partitioning the data, we have to have a partitioned index field. For example, user_id, which must be related to all records, is the primary key of the core table in the partitioned database, as a foreign key in other tables, and when using a primary key, the primary key cannot be self growing, it must be a business primary key.
Remainder partitioning:
We can deposit the USER_ID%10 values into different partitioned databases, the algorithm is simple and efficient, but when the number of partitioned databases changes, the entire system data needs to be distributed again.
Range Partitioning:
We can partition the scope of the user_id, for example, 1-100000 is a partitioned database, 100001-200000 is a partitioned database, the algorithm in the number of partitioned database changes, the system is very conducive to expansion, However, it is easy to cause different pressure between different partitions, such as the old user's partition database pressure is very high, but the new user's partition database pressure is small.
Mapping Relational partitions:
creates a partitioned mapping relationship for each possible result of a partitioned indexed field, which is so large that it needs to be written to the database. For example, when an application needs to know that the blog content of a user with user_id 10 is in that partition, it must query the database for answers, and of course we can use caching to improve performance.
this way the detailed preservation of each record of the partitioning of the relationship, so each partition has a very strong scalability, flexible control, and the database from one partition to another is very simple, but also can make each partition through flexible dynamic adjustment to maintain the distribution of pressure balance.
Turn from: http://www.cnblogs.com/lihaozy/archive/2013/08/02/3231776.html