LOGSTASH-INPUT-JDBC implementation of MySQL and Elasticsearch real-time synchronization in depth

Source: Internet
Author: User
Tags mysql insert logstash

Known as one of the most popular open source database, MySQL is widely used in various scenarios, ApsaraDB RDS for MySQL of Alibaba Cloud provided critical relief for companies, especially small and medium-sized enterprises affected by novel coronavirus (COVID-19).



The advent of elasticsearch makes our storage, retrieval data faster and more convenient. But in many cases, our demand is: the current data stored in MySQL, Oracle and other relational traditional database, how to try not to change the original database table structure, the insert,update,delete operation results of these data in real-time synchronization to Elasticsearch ( Abbreviation es)?
This article is based on the above demand point to expand actual combat discussion.


1. Pour cold water on real-time synchronization of delete operations


So far, all the google,stackoverflow,elastic.co,github above search plugin and real-time synchronization information, tell us: The current synchronization delete is not a good solution.
The compromise solution is as follows:
Solution discussion: HTTPS://DISCUSS.ELASTIC.CO/T/DELETE-ELASTICSEARCH-DOCUMENT-WITH-LOGSTASH-JDBC-INPUT/47490/9
http://stackoverflow.com/questions/34477095/elasticsearch-replication-of-other-system-data/34477639#34477639


Programme one,


In the original MySQL database table, add a field status, the default value is OK, if you want to delete data, but actually with the update operation, the status is changed to deleted.
In this way, you can synchronize to ES. The status value in ES distinguishes whether the row data exists. Deleted representative has been deleted, OK stands for normal.


Programme two,


Use the Go Elasticsearch plugin for synchronization, such as:. But I actually found that the plugin is not stable, more bugs. I also made a bug for the source author.
Bug See: HTTPS://GITHUB.COM/SIDDONTANG/GO-MYSQL-ELASTICSEARCH/ISSUES/46


Final discussion solution for delete operations (as of June 24, 2016):


http://stackoverflow.com/questions/35813923/sync-postgresql-data-with-elasticsearch/35823497#35823497



First, the software deletes the data instead of physically deleting it, adding a flag column that identifies whether the record has been deleted so that the same record will exist in Elasticsearch. You can perform a simple term query operation to retrieve the data information that has been deleted.
Second, if you need to perform cleanup cleanup data operations (physical deletion), simply delete the record of the tag bit deleted in the database and ES. such as: MySQL execution: delete from cc where cc.flag= ' deleted '; ES also performs the corresponding delete operation.


2. How do I use plug-ins to implement Insert,update synchronous update operations?


My last Post: http://blog.csdn.net/laoyang360/article/details/51694519 a little bit of discussion.
In addition to the three plugins mentioned in the previous article, it is recommended to try out a plug-in for the logstash that is useful, the name is: LOGSTASH-INPUT-JDBC


3. How do I install the LOGSTASH-INPUT-JDBC plugin?


Reference: Http://blog.csdn.net/yeyuma/article/details/50240595#quote
Netizen Blog has been introduced very detailed, no longer repeat.
Basic to this step:



cd/opt/logstash/



sudo bin/plugin install Logstash-input-jdbc



In this case, the basic will be successful. If not, please leave a message.


4, how to achieve real-time synchronization?

4.1 Prerequisites: MySQL exists database and table

The database name is: Test
Test the following table is named: CC
The data in the table is:


mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from cc;
+----+--------------------+---------+---------------------+
| id | name | status | modified_at |
+----+--------------------+---------+---------------------+
| 1 | laoyang360 | ok | 0000-00-00 00:00:00 |
| 2 | test002 | ok | 2016-06-23 06:16:42 |
| 3 | dllaoyang | ok | 0000-00-00 00:00:00 |
| 4 | huawei | ok | 0000-00-00 00:00:00 |
| 5 | jdbc_test_update08 | ok | 0000-00-00 00:00:00 |
| 7 | test7 | ok | 0000-00-00 00:00:00 |
| 8 | test008 | ok | 0000-00-00 00:00:00 |
| 9 | test9 | ok | 0000-00-00 00:00:00 |
| 10 | test10 | deleted | 0000-00-00 00:00:00 |
+----+--------------------+---------+---------------------+
9 rows in set (0.01 sec)
4.2 Required two files: 1) jdbc.conf; 2) Jdbc.sql.
[root@5b9dbaaa148a logstash_jdbc_test]# cat jdbc.conf
Input {
  Stdin {
  }
  Jdbc {
  # mysql jdbc connection string to our backup databse The following test corresponds to the test database in mysql
  Jdbc_connection_string => "jdbc:mysql://192.168.1.1:3306/test"
  # the user we wish to excute our statement as
  Jdbc_user => "root"
  Jdbc_password => "******"
  # the path to our downloaded jdbc driver
  Jdbc_driver_library => "/elasticsearch-jdbc-2.3.2.0/lib/mysql-connector-java-5.1.38.jar"
  # the name of the driver class for mysql
  Jdbc_driver_class => "com.mysql.jdbc.Driver"
  Jdbc_paging_enabled => "true"
  Jdbc_page_size => "50000"
The following corresponds to the absolute path of the sql to be executed.
  Statement_filepath => "/usr/local/logstash/bin/logstash_jdbc_test/jdbc.sql"
#Timing field The meaning of each field (from left to right) is divided into hours, days, months, and years. All are *default means that every minute is updated (test results, please leave a message if different)
  Schedule => "* * * * *"
Set the ES index type
  Type => "cc_type"
  }
}

Filter {
  Json {
  Source => "message"
  Remove_field => ["message"]
  }
}

Output {
  Elasticsearch {
#ESIPAddress and Port
  Hosts => "192.168.1.1:9200"
#ESIndex name (self-defined)
  Index => "cc_index"
#Self-incrementing ID number
  Document_id => "%{id}"
  }
  Stdout {
#output in JSON format
  Codec => json_lines
  }
}

# sql statement to be executed.
Choose which information to sync to the ES.
[root@5b9dbaaa148a logstash_jdbc_test]# cat jdbc.sql
Select
  *
From
Where cc.modified_at > :sql_last_value 
[Notice!] Watch out! Watch out! ]


Cc.modified_at, this modified_at is my own definition of the change Time field, the default value is now () the current time.
And: Sql_last_value if input is Use_column_value = True, that is, if set to true, it can be the last value of the field we set.
The default Use_column_value = = False, so that: Sql_last_value is the last-minute value of the previous update.
That is, the new value is not updated. This allows incremental updates to be achieved.



There are children's shoes asked, how to update the whole volume? Answer: Just remove the WHERE clause.


Step 1:


Create a new folder Logstash_jdbc_test under the bin path of Logstash and two files 1) jdbc.conf,2) jdbc.sql. Copy the template to the inside.


Step 2:


Modify the Conf according to your MySQL address, ES address, established index name, type name, and modify SQL to synchronize content.


Step 3:


Execute the Logstash as follows:
[[email protected] plugins]#./logstash-f./logstash_jdbc_test/jdbc.conf


Step 4:


Verify that synchronization is successful.
Can be passed: as shown in:


5. Precautions


If you want to test go-mysql-elasticsearch you may encounter the following three bugs and solutions as follows:


"Bug1"


How to Setting the Binary Log Format
Http://dev.mysql.com/doc/refman/5.7/en/binary-log-setting.html


"Bug2"


What is inner HTTP status address
Https://github.com/siddontang/go-mysql-elasticsearch/issues/11


"Bug3"


[2016/06/23 10:19:38] canal.go:146 [Error] Canal start sync binlog err:error 1236 (HY000): misconfigured master-server ID is not set
Http://dba.stackexchange.com/questions/76089/error-1236-from-master-after-restored-replication


6, summary


Real operation found: LOGSTASH-INPUT-JDBC can better achieve the MySQL insert, update operations of the increment, the full amount of data synchronization update to ES.
But the delete operation of the real-time synchronization does not have a good solution, if you have, and all test OK, please leave a message to tell me, not hesitate to enlighten!



2016-6-23 pm22:42 at home in front of the bed



Ming Yi World
Reprint please indicate source, original address: http://blog.csdn.net/laoyang360/article/details/51747266



LOGSTASH-INPUT-JDBC implementation of MySQL and Elasticsearch real-time synchronization in depth


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.