Data Distribution is usually used in high-performance computing (HPC. There are two main data distribution topologies: Replication and partitioning.
In a Data Replication environment, a data item usually has several copies, but data consistency should be ensured to a certain extent, so that the end user can only have one copy of data globally. The biggest challenge to using data replication is to make a correct balance between data consistency and Performance Based on business needs.
To achieve data consistency, some concurrency control schemes are usually used. This article will explain the concurrency control involved in replication in Oracle10g Advanced Replication, Oracle10g real application cluster (RAC), memory database (IMDB) Oracle10g TimesTen, and gigaspaces memory data grid (IMDG) 7.1.
During the discussion, we use a distributed airline ticket booking system, which is referred to as DATs. To ensure high availability and load balancing, DATS has two databases: New York and Los Angeles. According to the replication scheme, data can only be updated in one place, and then copied to another place; or both places are updated, and then copied to each other.
In addition, we assume that the following actions occur in chronological order:
- Two local database copies have been synchronized at the moment, with only one ticket left. Only the remaining tickets can be booked in New York or LA;
- A New York customer bought this ticket. This action updates the local database in New York and copies it to the database in Los Angeles in some way according to the replication scheme;
- According to the copy scheme, the Los Angeles database may show that the ticket can still be purchased, or it may show that the ticket has been booked by New York users. If the Los Angeles database still shows that this ticket is on sale, it will be sold to a Los Angeles customer. This leads to overselling.
Because DATs is only suitable for asynchronous replication in the WAN environment, in the synchronous replication environment, DATS should make the following changes (hereinafter referred to as "DATs change"): we assume that there is a second database in New York, in the same data center as the first database, this database will replace the database in LA.
We also assume that DATs uses an optimistic concurrency control mechanism. The following is how optimistic concurrency control works in DATS:
To achieve good performance, most multi-layer applications use Optimistic Concurrency Control, which leads to the loss of updates. For example, if we use Optimistic Concurrency Control in the two databases of DATs, the application layer of step 3 may read the database of Los Angeles before step 2, however, the ticket was sold to a Los Angeles customer only after step 2.
The application must use Optimistic Concurrency Control with version check to solve this problem. The version check scheme can only be a version number. If the corresponding data changes, the version number will be added.
Assume that the version of step 1 is 0. Step 2: update the version to 1. The version read at the application layer of step 3 is also 0. However, when the application layer tries to sell the same ticket, step 3 fails because it will find that the version has changed from 0 in the cache to 1.
1. synchronous Replication Using Distributed locks and local transactions
Oracle RAC, known as Oracle Parallel Server (OPS) in 8i and earlier versions, allows multiple instance sites to access the same physical database. To allow users to read and write any data at any time and on any instance site, Oracle RAC uses "cache fusion" to ensure data consistency.
Cache fusion mainly uses synchronous replication with distributed lock manager (DLM. One of the functions of DLM is to act as the coordination of distributed locks (DL), so that the same resource (such as a table row) can only be modified by one instance site at a time, and other sites must wait.
DLM or global resource directory. For example, when instance site 1 updates a line of content, it does not need to actively push the new version of data to other instance sites. Instead, it only needs to invalidate other copies. When instance Site 2 requests the same line of content later, DLM will ask it to obtain the latest version from instance site 1.
In addition, because there is a DLM and there is still only one physical database, instance site 1 does not need to use distributed transactions.
The advantage of doing so is that with a high degree of data consistency, both reading and writing have a high load balancing (in general, synchronous replication does not balance write operations, because the same writes are replicated across all sites. But with the lightweight RAC failure mechanism, write operations will also be relatively balanced ).
The disadvantage is that the write performance cannot be scaled (even if the failure mechanism is very light, too many failures will still block the shared interconnection; so the write operations of RAC still cannot be scaled ), because of replication synchronization and distributed locks, this method does not meet the high availability and fast interconnection requirements (the implementation of distributed locks usually has many daemon processes and data structures on each site, in low-speed LAN and wide-area networks, the coordination capability of distributed locks is very poor or even impossible. As for Oracle cache fusion, distributed locks are implemented using the Global cache service (GCS), Global Queue Service (GES), and Global Resource Directory (GRD) in the cluster environment ).
You should note that this solution is unique to RAC. If you have more than one data source with transactions, using local transactions may sometimes lead to data inconsistency. In any case, distributed locks and distributed transactions are used for Synchronous replication. Even if distributed transactions ensure data atomicity, they are very expensive to use. I have never seen any products that use this solution.
Due to the fact that synchronous replication is not realistic, it is not even possible in the WAN. This solution can only be applied to "DATs change ". Step 3: Wait for Step 2 to release the distributed lock. After step 3 acquires the distributed lock, the ticket is sold in step 2.
2. synchronous Replication Using local locks and distributed transactions
Oracle multi-master replication is also called Peer-to-Peer replication or N-way replication. It has two data consistency protocols: Synchronous replication, the other is described in section 4.
Synchronous replication applies the DML modification or replication process to all sites involved in the replication environment in a distributed transaction (each site has its own physical database, this is different from Oracle RAC. RAC only has one physical database ). If the DML statement or process fails at any site, the entire transaction will be rolled back.
Distributed transactions ensure data consistency across all sites in real time. However, it does not use any distributed locks. Instead, it only uses the local lock in the participants of the local transaction.
This is the case where the application synchronously updates a replicated table. Oracle First locks the local row, and then uses the after row trigger to lock the corresponding remote row. After all the sites commit transactions, Oracle releases the lock. As you can imagine, if multiple sites need to modify the same resource at the same time, a deadlock will occur. In addition, the same resource can only be modified by one instance site at a time, and its site must wait.
The advantage of this solution is that it avoids distributed locks and has a high degree of data consistency, which is simple and easy to manage.
The disadvantage of doing so is that temporary local and remote locks may cause deadlocks, poor write performance, and high availability and high-speed network, because distributed transactions need replication synchronization and two-phase commit (2 PC ).
Deadlock can be a serious problem in the case of high concurrency. When a deadlock occurs, Oracle rolls back the invalid transaction and keeps the other transaction. A rollback transaction returns an error code to the front-end application.
For the reason mentioned in section 1, this solution only applies to "DATs change ". Step 3: Wait for Step 2 to release the local and remote locks. After the lock is obtained in step 3, the ticket is sold in step 2.
3. synchronous Replication Using local locks and local transactions
TimesTen's unidirectional active standby pair configuration only uses the so-called "Return twosafe replication ". It supports full synchronization between the primary site (active site) and the subscription site (standby site.
TimesTen does not involve distributed transactions or distributed locks. Only local transactions and local locks are used. Specifically, the local transaction of the subscription site is submitted before the transaction of the primary site is committed. If the subscription site cannot be submitted, the primary site will not.
Only the active site can be updated at any time, which greatly simplifies the complexity of data update (otherwise, using local locks and local transactions is not enough ), it also ensures that the active site can be quickly transferred to the standby site if it fails.
The advantages and disadvantages of this solution are similar to those of the solution in section 2.
However, it has better performance because it avoids the two-phase commit required by distributed transactions. Because only the active site can be updated, this solution also eliminates the deadlock issue.
Although the standby site seems to be a waste of functions, you can put the standby site together with another active site, 1 (especially the active site and the standby site that match each other have different data ).
The data consistency of this solution is not very high, because if the master site fails to be submitted, even if the subscribe site is submitted successfully, it will lead to inconsistency (the root cause is that this solution does not use distributed transactions. However, you should also know that if the second commit or rollback phase of the two-phase commit fails, it will also lead to temporary data inconsistency ).
TimesTen's practice in this place continues their high-performance configuration ideas in asynchronous logging, post-write policy data caching, and other aspects. Gigaspaces IMDB uses a very similar topology called Master-Backup replication. The only difference is that gigaspaces IMDB uses distributed transactions, not just local transactions. Therefore, compared with TimesTen, gigaspaces IMDB has a higher data consistency.
Another benefit of using gigaspaces IMDB is that the Failover of gigaspaces IMDB is transparent to end users, while TimesTen users still need to turn to third-party or custom cluster management software.
For the reason mentioned in section 1, this solution can only be applied to "DATs change ". Two sites in New York, one is the active site, and the other is the standby site to connect to the active site to update data. The local lock on the active site will prevent overselling.
Compared with the preceding two synchronization schemes, we strongly recommend that you use this scheme and the Data Partition shown in Figure 1 for the following reasons:
- This solution greatly simplifies the complexity of data updates and provides high availability;
- Although the first two synchronization schemes allow data to be updated anywhere, updating the same resource requires locks and distributed transactions on the network. Scalable updates are usually implemented through data partitions;
- Although the first two synchronization schemes allow distributed and scalable read operations, you can still fine-tune the partitions to allow more concurrent reads.
Figure 1: Master-backup partition in gigaspaces IMDB
4. asynchronous replication that is updated everywhere
Another Data Consistency protocol for Oracle multi-master replication is asynchronous replication, which allows users to update data at any site. This solution is also used in Oracle updatable materialized view replication and TimesTen two-way Master-subscriber replication to handle general distributed workloads.
With this solution, data changes on one site are submitted locally and stored in the queue for transmission to other sites. Changes in the queue are transferred in batches in an independent transaction, so it does not need to use distributed locks or distributed transactions. Instead, it only uses the required local lock in the corresponding local transaction.
This solution has good read/write performance, is easy to implement, suitable for low-speed LAN and WAN, suitable for network disconnection updates. In particular, the wide area network deployment can achieve real disaster recovery for geographically dispersed data centers.
The disadvantage is that data consistency depends on the frequency of data refresh, which is relatively limited, and there may be data change conflicts.
Because it does not involve distributed locks or distributed transactions, if two transactions initiated by different sites update the same row at the same time, a replication conflict occurs. (When changes in the queue are transmitted to another site, there will be two versions of data changes on the other site. In this case, the application needs to decide which one to use .)
Conflict resolution must be provided to handle data inconsistency. Oracle and TimesTen both preset a "latest timestamp" solution, which is subject to the latest modification of the timestamp. Oracle also allows you to customize solutions based on your business needs.
If DATs does not allow overselling, this scheme is not applicable to DATs, because changes to sites in New York and Los Angeles can be submitted separately in two different transactions, this will cause two customers to purchase the same ticket.
If you allow occasional overselling, the New York site and the Los Angeles site can sell flights at different times using a three-hour time difference. If there is a replication conflict, the relevant information should be recorded in the database based on the measures taken by the front-end application (in reality, the reservation system does not adopt this scheme ).
5. Update only the asynchronous replication of the master site
Oracle read-only materialized view replication, TimesTen's one-way Master-subscriber replication, and gigaspaces IMDb's master-local replication all use this solution.
In general, this solution is used when you use optimistic locks to create multiple database sessions. First, query in a session. The returned data is actually a copy of the primary data of the database. Then, when you want to save the changes, save them to the backend database.
Distributed locks and distributed transactions cannot be used because they are changed only at the primary site. The advantages and disadvantages of this solution are similar to those described in section 4. However, since updates are only allowed on the primary site, this eliminates the notorious replication conflicts. In the asynchronous replication environment, this design is very complete in most cases.
If we use this scheme in the original DATs design and assume that the New York site or a third site acts as the main site, if la first obtains the local lock of the main site, the New York site has to wait. The local lock of the primary site will prevent overselling.
Similar to the content discussed at the end of section 3, we recommend that you use data partitions when using this solution. DATs can be enhanced through partitioning, for example, enabling New York to take charge of East Coast flights, and Los Angeles to take charge of West Coast flights.
Using the gigaspaces IMDB master-Local Topology shown in Figure 2 makes things easier, because this topology can automatically update the local cache to the master site, the main site will spread the same update to other local caches. Gigaspaces IMDB also supports version-based Optimistic locks.
Whether you use ORACLE read-only materialized view replication or TimesTen one-way Master-subscriber replication, you have to handle these issues on your own.
Figure 2: Master-Local Topology of gigaspaces. The content in Figure 1 can act as the master site
6. Conclusion
Data replication can be divided into synchronous and asynchronous. Synchronous replication ensures high data consistency, but requires high availability and high-speed networks. Synchronous replication is usually used to protect data for key tasks, such as data in the financial industry.
Asynchronous replication provides better write scalability, but reduces data consistency. Asynchronous replication is usually used to balance write operations and provide disaster recovery.
There are several solutions for each replication category to provide different concurrency control. Even if the appropriate solution depends on the specific business needs, we recommend that you use the solution discussed in section 3 and section 5.
Finally, readers should pay attention to two points: first, there are some interesting replication schemes that are not mentioned here. For example, use TimesTen's "Return receept replication" and MySQL 5.5's "semi-synchronous replication", and use the gigaspaces data post-Write function in combination with asynchronous replication.
Another thing to note is the current development trend of nosql. Most nosql products boast scalability and assume that failures are inevitable. Therefore, they rely on data replication to ensure load balancing and high availability of read/write operations. This article only mentions three typical nosql implementations.
Couchdb is built on the Erlang OTP platform. With bidirectional asynchronous incremental replication, couchdb allows distributed or even disconnected document updates.
Cassandra allows cross-Data Center replication and provides different degrees of data consistency.
Finally, gigaspaces runs as IMDB and relies on replication to achieve high availability and post-Write Functions, reducing the importance of traditional relational databases. In addition to the original key-value ing API, the latest version 8.0 also supports a new document interface.
Address: http://www.infoq.com/cn/minibooks/architect-july-10-2011