Logstash MySQL quasi real-time sync to Elasticsearch

Last Update:2018-04-15 Source: Internet

Author: User

Tags logstash

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MySQL as a mature and stable data persistence solution, widely used in various fields, but in the data analysis of a little bit, and Elasticsearch as the leader in the field of data analysis, just can compensate for this deficiency, and we need to do is to synchronize the data in MySQL to Elasticsearch, and Logstash just can support, all you need to do is write a configuration file

Logstash get

Get Logstash

wget https://artifacts.elastic.co/downloads/logstash/logstash-6.2.3.zipunzip logstash-6.2.3.zip && cd logstash-6.2.3

Installing the JDBC and Elasticsearch plugins

bin/logstash-plugin install logstash-input-jdbc && bin/logstash-plugin install logstash-output-elasticsearch; \

Get JDBC MySQL driver

wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.46.zip && unzip mysql-connector-java-5.1.46.zip && \

Writing a configuration file Logstash-input-jdbc

Using the LOGSTASH-INPUT-JDBC plug-in to read the MySQL data, this plug-in works relatively simple, is to execute a SQL on a timed basis, and then write the results of SQL execution to the stream, the incremental gain is not synchronized by Binlog mode, Instead, an increment field is used as a condition to query, each time the location of the current query is recorded, because of the increment of the characteristics, only need to query than the current large record can get all the increment of this period of time, the general increment field has two kinds, AUTO_INCREMENT the primary key id and ON UPDATE CURRENT_TIMESTAMP the update_time field, id Fields are only available for tables that are not updated and update_time are more generic, and it is recommended that you add a field to the MySQL table design. update_time

input {jdbc {jdbc_driver_library =  Jdbc_driver_class =  "com.mysql.jdbc.Driver"  jdbc_connection_string =  jdbc_user =  Jdbc_password =  Schedule +  "* * * * *"  statement =  "SELECT * FROM table WHERE update_time >: Sql_last_value"  use_column_value = true< /span> Tracking_column_type =  "timestamp"  tracking_column =  Last_run_metadata_path = }}

jdbc_driver_library: JDBC MySQL-driven path, downloaded in previous step
jdbc_driver_class: The name of the driver class, MySQL fill com.mysql.jdbc.Driver it out
jdbc_connection_string: MySQL Address
jdbc_user: MySQL User
jdbc_password: MySQL Password
schedule: Execute SQL timing, similar to crontab scheduling
statement: The SQL to execute, starting with ":" is a defined variable, you can set the variable by parameters, here sql_last_value is a built-in variable that represents the value of Update_time in the last SQL execution
use_column_value: Use the Increment column value
tracking_column_type: Increment the type of a field, numeric representing a numeric type, representing a timestamp timestamp type
tracking_column: Increments the name of the field, where the Update_time column is used, and the type of this column istimestamp
last_run_metadata_path: Synchronization point file, this file records the last synchronization point, the restart will read the file, this file can be manually modified

Logstash-output-elasticsearch

output {  elasticsearch {    hosts => ["172.31.22.165""172.31.17.241""172.31.30.84""172.31.18.178"]    "<user>"    "<password>"    "table"    "%{id}"  }}

hosts: ES cluster address
user: ES user Name
password: es password
index: Import to the index name in es, where I set the name of the MySQL table directly
document_id: The document ID imported into ES, which needs to be set to primary key, otherwise the same record will be updated in ES with two records indicating the %{id} value of the field referenced in the MySQL table id

Run

Save the above code in a configuration file sync_table.cfg , execute the following command

cd logstash-6.2.3 && bin/logstash -f config/sync_table.cfg

If successful, the SQL statement executed on the standard output output will be

[2018-04-14T18:12:00,278][INFO ][logstash.inputs.jdbc     ] (0.001011s) SELECT version()[2018-04-14T18:12:00,284][INFO ][logstash.inputs.jdbc     ] (0.000723s) SELECT * FROM table WHERE update_time > ‘2018-04-14 17:55:00‘

Other issues multi-table synchronization

A Logstash instance can synchronize multiple tables with the pipelines mechanism, just write multiple configuration files, assuming we have two tables table1 and table2, corresponding to two configuration files sync_table1.cfg andsync_table2.cfg

In the config/pipelines.yml configure

- pipeline.id: table1  path.config: "config/sync_table1.cfg"- pipeline.id: table2  path.config: "config/sync_table2.cfg"

Direct bin/logstash Start

@timestampField

By default, the field @timestamp is the field added by Logstash-input-jdbc, and the default is the current time, which is useful when analyzing data, but sometimes we want to use some fields in the data to specify this field, this time we can use Filter.date, This plugin is dedicated to setting @timestamp this field.

Like I have I want to timeslice be represented by @timestamp a field, timeslice is a string, formatted as%Y%m%d%H%M

filter {  date {    "timeslice""yyyyMMddHHmm" ]    "Asia/Shanghai"  }}

Add this section of the configuration to the sync_table.cfg now @timestamp and the timeslice same

Reference links

LOGSTASH-INPUT-JDBC plug-in: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html
Logstash-output-elasticsearch plug-in: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html
Logstash-multiple-piplines:https://www.elastic.co/blog/logstash-multiple-pipelines
Logstash-filter-date plug-in: https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html

Reprint please indicate the source
This article link: http://www.hatlonely.com/2018/04/14/logstash-mysql-%E5%87%86%E5%AE%9E%E6%97%B6%E5%90%8C%E6%AD%A5%E5%88% b0-elasticsearch/

Logstash MySQL quasi real-time sync to Elasticsearch

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More