SymmetricDS document translation-[Chapter 1. Introduction (Introduction )],
This User Guide briefly introduces the basic and advanced concepts in metrics ricds configuration. After reading this guide, you will have a better understanding of the capabilities and concepts of metrics ricds.
1.1. System Requirements
Javasricds is written in Java and requires JRE, JDK 6.0, and later.
Any database with Trigger technology and JDBC driver may be able to use javasricds. Databases are abstracted through Database Dialect to support different Database features. The following database dialect has been included in this release (version 3.6.14 ):
1. MySQL 5.0.2 and later versions
2. MariaDB and later versions
3. Oracle 10 Gb or later
4. PostgreSQL 8.2.5 and later versions
5. SQL Server 2005 and later versions
6. SQL Server Azure
7. HSQLDB 2.x
8. H2 1.x
9. Apache Derby 10.3.2.1 and later versions
10. IBM DB2 9.5 and later
11. Firebird 2.0 and later versions
12. Interbase 2009 and later versions
13. Greenplum 8.2.15 and later versions
14. SQLite 3 and later versions
15. Sybase Adaptive ServerEnterprise 12.5 and later
16. Sybase SQL Anywhere 9 and later versions
Appendix C Database Notes allows you to view compatibility issues and other details of your selected Database.
1.2. Concepts1.2.1 Notes
SymmetricDS is a Java-based application that provides a synchronization engine that acts as an agent in data synchronization (the database instance mentioned after the proxy ), provides data synchronization between a database instance and other synchronization engines in the network.
A javasricds engine is called a Node. Elastic ricds is designed to expand to thousands of nodes. The database connection string, database username, and database password provided in the property configuration file to configure database connection information. Mongoricds can synchronize any database tables accessible to the database, as long as the database user is assigned appropriate database permissions.
An extenricds node is assigned an external id and a node group id. External id is a user-specified identifier. Metrics ricds uses this identifier to identify a specific node where data is sent. The group id of a node is used to identify the group or layer of the node. It defines the positions of all nodes in a node's entire network. For example, a node group may be named 'regionate', which represents a database of an enterprise or a company. A group of another node may be named "local_office ", represents the databases of different organizations in a region. The external id of a "local_office" may be a string consisting of an organization's code or other labeled letters. A node is separated by a unique node id in a network. This node id is automatically generated according to the external id. If the local organization code 1 has two databases and two clustered ricds nodes, they may have an external id with a value of "1" and a node id with a value of "1-1" and "1-2.
SymmetricDS can be deployed in multiple ways. The most common choice is to run on the server as a separate process in the form of a service. When deployed in this way, mongoricds can be used as a client or a multi-tenant server, or depends on the location of the mongoricds database in the entire database network. Although it can run on the same server as the database server, this is not required. Metrics ricds can be deployed on an Application Server, such as Apache Tomcat, JBoss Application Server, and IBM WebSphere, as a web Application.
SymmetricDS is designed as a simple and easy-to-use tool for technicians. It can be considered as a web application, but uses another javasricds engine as the client to replace the browser role. It has all the features of web applications. You can use the principle of debugging web applications to debug javasricds.
1.2.2. Change Data Capture
The function of capturing data changes is enabled for the database trigger. The function automatically installs the Trigger Based on your configuration. The DATA changes recorded by database Triggers are all in the DATA table (the DATA table is the system table in structured ricds ). The database is designed as non-invasive and lightweight as possible. After the initricds trigger is installed, all DML statement executed by external applications will capture data changes. Note: you do not need to add additional libraries or make any changes to your applications. Metrics ricds does not need to be online to capture data.
The database tables of different database instances configured in replica ricds must have the same structure. The configuration of nodes in the entire network is usually managed by a central node in the network, that is, the registration server node. The Registration Server node is almost always the same as the root node in the tree topology. When you configure a "leaf" node, You need to configure a start parameter as the URL of the registration server node. If the "leaf" node has not been registered with the root node, it will contact the registration server and request to join the network. Once the request is accepted, the "leaf" Node downloads the required configuration. After a node is registered, mongoricds can also provide an initial data load operation before synchronization starts.
Mongoricds will install or update its database trigger at startup. When a scheduled synchronization trigger task is running, mongoricds regularly installs new and new triggers (by default, at midnight every day ). When determining whether a trigger needs to be re-created, the synchronization trigger task detects changes in the database structure or trigger configuration. The synchronization trigger task can be disabled. The DBA can generate and run the DDL script of the database trigger.
After the changed DATA is inserted by the database trigger into the DATA system table of metrics ricds, the DATA is distributed to a certain metrics ricds node in batches by the Router Job. Route data refers to selecting a node where data should be sent in the tricricds network. By default, data of a node is routed according to the node group ID. Optional. The features of data and target nodes can also be used during routing. A Data batch is a set of data changes. This group of data is transmitted and loaded to the target node together and committed as a database transaction. The Batch information is recorded in the OUTGOING_BATCH system table of metrics ricds. Batch is node-specific. Each node has only the Batch records it has processed. DATA and OUTGOING_BATCH are contacted through DATA_EVENT. The sending status of the Batch is recorded in OUTGOING_BATCH. After the data is sent to a remote node, the batch status is changed to "OK ".
1.2.3. Change Data Delivery
Data is sent to a remote node over HTTP or HTTPS. Data can be sent in either of these two ways, depending on the type of transmission link between the configured nodes. A node group can be configured to push changes to other nodes in a node group, or to pull data from other nodes in a node group. Data Push is implemented by initializing a Push Job on the data source node. If multiple batchs are waiting to be sent, the push node maintains the connection from the same http head request to each target node. If the reserved request is accepted, the data source node extracts all data from the batch. The data is extracted from the high-memory buffer in CSV format until the buffer size reaches the configured threshold. The data is sent to the target node through http put. The next Batch is then extracted and sent. This will be repeated until the batch sent to each channel reaches the maximum value, or no batch can be sent. Because all batchs are sent through an http put request, the target node also returns a list of batchs.
The Pull request is initiated on the target node through the Pull Job. A pull request is submitted using http get. The extraction process executed in the Push process will also be executed in the Pull process.
After the data is extracted and sent, the data is loaded to the target node. Similar to the extraction process, the Data Loader caches data into the memory buffer in CSV format until the threshold value is reached. If the threshold value is reached, the data is written to a file and received. All data in a batch is locally available. A database connection is taken out of the connection pool, and what happens in the source database will be repeated in the target database.
1.2.4. Data Channels
Data is always sent to a remote node in the order recorded in a specific channel. A channel is a user-defined set of mutually dependent tables. The captured data of tables in a group is always synchronized together. Each trigger must be assigned a channel id as part of the trigger definition. The Channel id is recorded in the SYM_DATA and SYM_OUTGOING_BATCH system tables of metrics ricds. If a batch fails to be loaded, no data will be sent to this channel until the failure is processed. However, data on other channels will not be affected and synchronization continues.
If the remote node is offline, the data is still recorded in the source database until the remote node is released again. (Optional) you can set a time-out period. When the time-out period expires, the offline node is deleted from the network. The data in the table where the data captured by ricds is located will be deleted from the system table where the captured data is stored by ricds after it is sent or after the configured retention period expires. Any unsent data changes that will be sent to a closed node will also be cleared.
When data integrity errors occur, the default processing method is to try to fix the data. If an insert statement is executed but such a row of data already exists in the table, mongoricds rolls back the insert operation and tries to update the existing row. Similarly, if the row to be updated is not found when an update operation successfully executed on the source database node is executed on the target node, mongoricds will roll back the update operation, insert the row of data into the database. If the delete operation is performed on the target node, but the row to be deleted is not found, this will be a simple record. These processing methods can be monitored and handled by adjusting the configuration.
SymmetricDS is designed using standard web technologies, so it can be extended to multiple clients of different database types. It can synchronize data to the same number of client nodes as the deployed databases and clients supported by the network infrastructure. It can also pull data from so many clients to synchronize data. When a two-tier database and network infrastructure is insufficient, a wide ricds network can be designed to use N layers for higher scalability. Now we have introduced how to synchronize data between multiple databases in a standard way.
1.3. Features
SymmetricDS has many features you may need or want for data synchronization. Most of these features are added based on the Use feedback of metrics ricds in the production environment.
1.3.1.Two-Way Table Synchronization
In fact, data synchronization usually only needs to be synchronized in one direction. For example, a distribution store sends its goods transaction information to the central database, and the central database sends inventory information and prices to the store. Other data may need to be synchronized in two-way. For example, the distribution store sends a central database inventory list document, then the central database updates the data in the document, and then sends it back to the store. SymmetricDS supports two-way table synchronization, and avoids an update loop by only recording data changes beyond synchronization.
1.3.2. Data Channels
SymmetricDS supports the concept of data channels. Data Synchronization is defined at the table level (the entire table or a part of the table's data). Each managed database table is allocated to a channel, which helps control the data flow. A channel is a type of data, and the data of a channel can be synchronized without relying on the data in other channels. For example, in a distribution example, a promotion event may update a lot of commodity information, but the user may be waiting for updates to the inventory list document. If it is processed in order, the item update will delay the update of the inventory list, even though the data is not linked. By allocating the item table to the item channel and the inventory table to the inventory channel, the data changes of the two tables are processed separately. Therefore, inventory can operate on the data regardless of the large amount of product data.
The Channel will be discussed in detail in Section3.3 "Channel.
1.3.3. Change Notification
After a data change is recorded in the database, the secondary ricds node that is interested in the change is awakened. Change Notification is configured to execute data push or pull. When several nodes direct their data changes to a central node, it is efficient to use push instead of waiting for the central node to pull from each source database. If the network is configured with a firewall to protect a node, the pull configuration may enable the node to receive data changes, and the push method will be blocked. The frequency of Change Notification is configurable. The default frequency is once a minute.
1.3.4 HTTP (S) Transport
By default, javasricds uses the web-based HTTP or HTTPS request in the rest style. This is a lightweight and manageable method. A series of filters are provided to forcibly Authenticate and limit the number of data streams synchronized at the same time. The ITransportManager interface allows other data transmission methods.
1.3.5. Data Filtering and Rerouting
Using metrics ricds, data can be recorded and filtered during extraction and loading.
1. Data routing is completed by inserting a given type of ROUTER into the router of the tricricds system table. The Router determines the target node to which the captured changes should be sent. A custom router can be provided by implementing an IDataRouter interface.
2. In addition to synchronization, mongoricds can perform complex data conversion as the synchronized data is loaded to the target database. Data conversion can be used to merge source data, generate copies of multiple source data to multiple target database tables, set the default value in the target database, and so on. The conversion type can be extended and custom conversion can be created.
3. Data changes are loaded into the target database. The data can be filtered by a simple shell load filter or an IDatabaseWriterFilter implementation class. You can change the data in a column, route the changed data to any place, trigger initialization load or other possible situations. One possible use is to route credit card data to a secure database and then empty out the database that stores sales information in the center. Filters can also prevent all data from reaching the target node, and then use the default data value when loading data on the target node, which is very efficient.
1.3.6. Transaction Awareness
Many databases provide globally unique transaction identifiers, which are associated with multiple rows committed as a transaction. Mongoricds also stores transaction identifiers along with the changed data. Therefore, mongoricds can accurately roll back a transaction. This means that the target database maintains the same transaction integrity as the source database. Databases that support transaction identifiers are recorded in the appendix.
1.3.7. Remote Management
The management function is exposed through JMX and can be accessed through the Java JConsole tool or an application server. Functions include enable registration, reload data, clear old data, and view batch information. Many configuration information and runtime attributes can also be viewed.
Mongoricds also provides the SQL event sending function, which is the same as the synchronization mechanism used to send data. Data payload allows arbitrary SQL statement. Event Processing and response are the same as other types of events.
1.3.8. File Synchronization
Many mongoricds users have discovered that they not only need to synchronize database tables to the far end, but also have a series of files to be synchronized. Starting from version 3.5, javasricds began to support file synchronization.
Please refer to Section3.5 "File Trigger/File Synchronization" for more information.
1.4 Why Database Triggers?
In relational databases, there are several ways to capture changed data for replication, synchronization, and integration.
1. Lazy data capture uses a conditional (such as a timestamp column) SQL statement from the source database system to query changed data.
2. Trigger-based data capture install a database Trigger to capture changed data.
3. Log-based data capture reads data changes from the database recovery Log.
The above methods in the mountains have both advantages and disadvantages, all of which are in the development plan of SymmetricDS. Currently, mongoricds supports trigger-based data capture and unfair lazy data capture. There are many reasons to implement these two technologies first. The most important thing is that most of the use cases that need to be solved by using trigger-based solutions. To some extent, conditional replication is supported by more database platforms that use enterprise standard technologies. This fact makes the developer's valuable time and manager put into designing a product that is easy to install, configure, and manage, rather than spending time on reverse database log files.
Trigger-based data capture introduces a way to measure the database operation overhead. The overhead will change with the processor's ability and configuration to the database platform's resources and the way the application uses the database. With the constantly improved hardware and database technologies, trigger-based data capture will become more flexible for applications that require high data throughput or expansion.
Trigger-based data capture is easier to implement and be supported than log-based solutions. It uses well-known database concepts and is easier for software, database developers, and database administrators to understand. It is usually installed, configured, and managed by the application development team or database administrator, and does not need to be deployed on the database server.