Brief introduction
In a modern enterprise environment, it is no surprise that multiple databases and multiple brands of databases are used to store corporate data. Eventually, the data will be compared and merged out of different databases.
If you have a heterogeneous database environment, and you plan to collect data from different databases into a separate application, you should be able to perform the task using traditional techniques. When using Java, you will handle all database operations through JDBC. Listing 1 shows a snippet of how to connect DB2 udb and IDs in a Java application.
Listing 1. Use JDBC to establish a connection to a different database
1 try { // load JDBC drivers
2 Class.forName (JDBC_DRIVER_DB2);
3 Class.forName (JDBC_DRIVER_IDS);
4 }
5 catch (Exception e) {
6 // error handling
7 }
8
9 try { // establish connection and proceed with operation
10 con_db2 = DriverManager.getConnection (DBURL_DB2);
11 con_ids = Drivermanager.getConnection (DBURL_IDS);
12
13 Statement stmt_db2 = con_db2.createStatement ();
14 Statement stmt_ids = con_ids.createStatement ();
15
16 ResultSet rs_db2 = stmt_db2.executeQuery (SQL);
17 ResultSet rs_ids = stmt_ids.executeQuery (SQL);
18
19 // do something very important with the result sets...
20 }
21 catch (SQLException e) {
22 // error handling
23 }
Introduction to two-phase commit protocol
The demo in Listing 1 allows you to modify the data in different databases. Instead of executing the query, it can use the JDBC Method executeupdate () to perform data modification.
But what do you do if you need to encapsulate inserts into a new row of DB2 and IDs tables in a single transaction?
This means that if one of the INSERT statements fails, the database should be (here: two databases!). Is restored to the state where the client did not perform any action. This behavior can be accomplished by using the two-phase commit (TWO-PHASE-COMMIT) protocol. This standardized protocol describes how to implement the technology of Distributed Transaction (XA) or distributed units of work (distributed unit of WORK,DUOW) to achieve a consistent state across the database system (based on ACID).
In a general transaction (one-phase commit), the termination of a transaction performed by a Commit or rollback is a decisive operation, in contrast to a two-phase commit (Two-phase-commit) transaction that is divided into two steps (stages).
First, the initiation of a two-phase commit (Two-phase-commit) transaction is similar to a regular single phase commit (One-phase-commit) transaction. The application/client then performs its modification work on all databases involved in the two-phase commit (TWO-PHASE-COMMIT) operation. Now, before the transaction is finally committed, the client notifies the participating database to prepare for submission (phase 1th). If the client receives a "Okay" from the database, it sends a command to submit the transaction to the database (phase 2nd). The last distributed transaction (distributed Transaction) ends.
Phase 1th of the two-phase submission (TWO-PHASE-COMMIT) is important. By first asking the database if it is ready to commit, once a participating database reports an error, there is an opportunity to abort the entire transaction immediately. Thus, the 2nd phase will be completed by rollback rather than by COMMIT.
Figure 1 provides a graphical impression of how the two-phase commit (TWO-PHASE-COMMIT) protocol works. As demonstrated, a distributed transaction (distributed Transaction) uses a descriptor represented by a tuple (for example, [X,B1]). It means that a distributed transaction (distributed Transaction) contains two elements. First, there is a unique global transaction ID (the global transaction ID)-A simple identifier that represents a distributed transaction (distributed transaction)-represented by X, and the second is a branch ID (branch ID), which describes a portion of the entire transaction. Typically, a branch refers to a database connection. If you have a distributed transaction (distributed Transaction) that will handle two participating databases, you can represent a database with descriptors such as [100,1], and another database with descriptors such as [100,2]. So in this case, there is a global transaction numbered 100 that contains two branches with IDs of 1 and 2 respectively.
"But," you might ask, "What happens if there are errors in phase 2nd of the two-phase commit (two-phase-commit) protocol?" ”
"Indeed, you will get into trouble!" ”
In fact, we'll discuss the topic later.
Figure 1. Timeline and application flow in two-phase commit