IBM InfoSphere CDC is a powerful data real-time replication software that is not only widely used for heterogeneous platform integration of traditional ODS, data warehouses, data marts and BI systems, but also provides full support for cloud, and for various cloud scenarios, CDC not only provides low impact , near real-time mass data replication, while also ensuring the integrity and security of data during transmission.
As IBM's flagship brand, the Bluemix public cloud platform is a platform-as-a-service (PaaS) offering based on the cloud Foundry Open source project, which enables organizations and developers to quickly and easily create, deploy, and manage applications on the cloud. Bluemix offers a wide range of applications and services to the world's customers, including IBM Hadoop products in the cloud: BigInsights.
With IBM InfoSphere CDC, you can easily synchronize the data on your local database (on Premise) to biginsights in the Bluemix cloud in real time to address several of the big data analytics challenges:
- Processing of massive amounts of data
- Diversity of data sources
- The Agility of data analysis
- The persistence of data analysis
Next, we'll show you how to use CDC to build a real-time synchronization scenario for a local database (for example, DB2) to the cloud biginsights.
On Premise System Configuration
1. Configure and confirm that the current DB2 database is functioning correctly.
2. Install Infosphere CDC for DB2 (the CDC engine at the source to capture incremental data changes by resolving DB2 logs in real time).
3. Install Infosphere CDC for DataStage (the CDC engine on the target side to apply real-time incremental data from the source side to the target Hadoop platform/hdfs file system).
4. Configure the CDC server's internal network connection (firewall) to Bluemix.
5. Install the CDC Configuration Management Monitoring platform (Management Console and Access Server).
Create a BigInsights for Apache Hadoop service
1. Sign in to the Bluemix platform (with Bluemix ID required).
https://console.ng.bluemix.net/
2. Click "Contents" at the top of the page and tick "data and analysis" in the "Services" section on the left side of the page, then select "BigInsights for Apache Hadoop".
3. Go to the BigInsights for Apache Hadoop page, specify the relevant properties and create the service.
Check the BigInsights for Apache Hadoop service
1. From the user dashboard of Bluemix, click on the newly created "BigInsights for Apache Hadoop" service.
2. Check the validity period of the current service, usually free of charge for one months.
3. Check the credentials and configuration information for the current service, such as user name and password.
Start the BigInsights for Apache Hadoop service
1. Click "Launch" in the Biginsights for Apache Hadoop page to start the service.
2. Check the hostname, port (8443), and URL prefix (/gateway/default/) for the biginsights for Apache Hadoop for use by the CDC configuration.
Create a CDC subscription and configure table mappings
1. In the CDC Configuration Management Monitoring platform (Management Console), create a subscription and run the Table Mapping Wizard.
2. Select the target-side delivery method for Apache Hadoop-Web HDFS.
3. Select the DB2 source table that you want to copy, and specify the Web HDFs directory path for the target-side biginsights.
Configure Hadoop properties for CDC subscriptions
1. Right click on the reservation and select Hadoop Properties.
2. You can modify the batch size value (Generate a trigger condition for the floor file) and enter the connection information biginsights through the Web HDFs connection.
3. Start the subscription and start the live copy.
Verifying data real-time synchronization results
1. Run several transactions on the local DB2 database to make changes to the source table data that CDC is monitoring.
2. Click Bigsheets on the target end of the biginsights for Apache Hadoop home page to monitor the data from the source side
The DB2 has been synchronized in real time, completely automated, low latency and accurate.
Friends interested in Biginsights can click the link below to view the details and download:
Http://bigdata.evget.com/product/385.html
InfoSphere CDC Real-time synchronization of local data to the cloud Biginsights