With the popularization of Internet applications, the storage and access of massive data has become the bottleneck of system design. For a large Internet application, millions or even hundreds of millions of PVS each day will undoubtedly cause a considerable load on the database. This poses great problems to system stability and scalability. I. Server Load balancer technology the Server Load balancer cluster is composed of a group of independent
With the popularization of Internet applications, the storage and access of massive data has become the bottleneck of system design. For a large Internet application, millions or even hundreds of millions of PVS each day will undoubtedly cause a considerable load on the database. This poses great problems to system stability and scalability. I. Server Load balancer technology the Server Load balancer cluster is composed of a group of independent
With the popularization of Internet applications, the storage and access of massive data has become the bottleneck of system design. For a large Internet application, millions or even hundreds of millions of PVS each day will undoubtedly cause a considerable load on the database. This poses great problems to system stability and scalability.
I. Server Load balancer technology a Server Load balancer cluster is composed of a group of independent computer systems that are connected through a conventional network or a dedicated network. The routers are connected together, nodes collaborate with each other, share loads, and balance the load. for clients, the entire cluster can be considered as an independent server with ultra-high performance.
1. To achieve Load Balancing Technology for databases, you must first have a control terminal that can control the connection to the database. Here, it truncates the direct connection between the database and the program. All programs access the middle layer, and then the middle layer accesses the database. In this way, we can specifically control access to a database, and then adopt an effective balance policy based on the current load of the database to adjust the database to be connected each time. 2. implementing multi-data database data synchronization for Server Load balancer, the most important thing is that data on all servers is synchronized in real time. This is necessary for a cluster, because if data is not synchronized in real time, the data read from one server is different from the data read from another server, this is not allowed. Therefore, you must synchronize data in the database. In this case, you can have multiple resources during the query to achieve a balance. The Moebius for SQL Server cluster is a commonly used method. The Moebius for SQL Server Cluster uses the method of resident the core program in the database of each machine. This core program is called Moebius for SQL Server middleware, the main function is to monitor changes in data in the database and synchronize the changed data to other databases. After the data synchronization is complete, the client will get a response. The synchronization process is completed in concurrency, so the time for synchronizing data to multiple databases is basically the same as the time for synchronizing data to one database; in addition, the synchronization process is completed in the transaction environment, ensuring the consistency of multiple data copies at any time. Because of the innovation of Moebius middleware host in the database, the middleware can not only know data changes, but also the SQL statements that cause data changes, intelligent adoption of different data synchronization policies based on the types of SQL statements to minimize data synchronization costs.
If the number of data entries is small and the data content is small, the number of data entries that are directly synchronized is small, but it contains big data types, such as text and binary data, data is first compressed and then synchronized to reduce network bandwidth usage and transmission time. The number of data entries is large. In this case, the middleware obtains the SQL statement that causes data changes, parses the SQL statement, and analyzes the execution plan and execution cost, select whether to synchronize data or SQL statements to other databases. This scenario is useful when you adjust the table structure or modify data in batches. 3. Advantages and disadvantages (1) Strong Scalability: when the system requires a higher database processing speed, it can be expanded simply by adding database servers. (2) maintainability: When a node fails, the system automatically detects the fault and transfers the application of the faulty node to ensure the continuous operation of the database. (3) Security: because data is synchronized to multiple servers, Data Set redundancy can be achieved and security can be ensured through multiple data copies. In addition, it successfully puts the database into the Intranet, better protecting the security of the database. (4) ease of use: the application is completely transparent, and the cluster exposes an IP address.
(1) The server load cannot be allocated according to the processing capacity of the Web server. (2) The failure of the Load balancer (Control End) will paralyze the entire database system.
Ii. Database read/write splitting 1. Implementation principle: read/write splitting simply separates read and write operations on the database into different database server, which can effectively reduce the pressure on the database, it can also reduce the io pressure. The primary database provides write operations and read operations from the database. In many systems, the primary database is read operations. When the primary database performs write operations, data must be synchronized to the slave database to effectively ensure database integrity.
(Ebay's read/write ratio is 260: 1. ebay's read/write splitting)
(Microsoft database distribution)
2. Implementation Method: in MS SQL server, you can use the publishing definition method to achieve database replication and read/write splitting, replication is a technology that copies a set of data from one data source to multiple data sources. It is an effective way to publish a set of data to multiple storage sites. With the replication technology, you can publish one copy of data to multiple servers. The replication technology ensures that data distributed in different locations is automatically updated synchronously to ensure data consistency. There are three types of SQL SERVER replication technology: snapshot replication, transaction replication, and merge replication. SQL SERVER mainly uses publications and subscriptions to process replication. The server on which the source data is located is the publishing server, which is responsible for publishing data. The Publishing Server copies all changes to the data to be published to the distribution server. The distribution server contains a distribution database that can receive all changes to the data and save the changes, then, distribute these changes to the subscription server.
3. Advantages and disadvantages (1) poor real-time data: the data is not synchronized to the self-read server in real time. After the data is written to the master server, it can be queried only after the next synchronization.
(2) Synchronization efficiency is poor when the data volume is large: when the data volume of a single table is too large, insertion and update may suffer from index, disk IO, and other problems, resulting in poor performance.
(3) connecting multiple (at least two) databases at the same time: at least two data databases must be connected. The actual read/write operations are completed in the program code, which may cause confusion.
(4) high performance, high reliability, and scalability: Read-Only servers can greatly reduce disk I/O performance problems and improve efficiency because no write operations are performed. Read-Only servers can adopt load balancing, the primary database is released to multiple read-only servers to achieve read operation scalability.
Iii. database/data table splitting (distributed)
A specific condition is used to distribute data stored in the same database to multiple databases for distributed storage and access to a specific database through routing rules, in this way, each access is not a single server, but N servers, which can reduce the load pressure on a single machine. Tip: sqlserver 2005 and later versions support "Table Partitioning ".
Vertical (vertical) split: Refers to splitting by function module, such as order database, commodity database, user database... in this way, the table structure between multiple databases is different.
Horizontal (horizontal) split: save data from the same table to different databases in different parts. The table structures in these databases are identical.
(Vertical Split)
(Horizontal split)