Logstash MySQL quasi real-time sync to Elasticsearch

Source: Internet
Author: User
Tags logstash

MySQL as a mature and stable data persistence solution, widely used in various fields, but in the data analysis of a little bit, and Elasticsearch as the leader in the field of data analysis, just can compensate for this deficiency, and we need to do is to synchronize the data in MySQL to Elasticsearch, and Logstash just can support, all you need to do is write a configuration file

Logstash get

Get Logstash

wget https://artifacts.elastic.co/downloads/logstash/logstash-6.2.3.zipunzip logstash-6.2.3.zip && cd logstash-6.2.3

Installing the JDBC and Elasticsearch plugins

bin/logstash-plugin install logstash-input-jdbc && bin/logstash-plugin install logstash-output-elasticsearch; \

Get JDBC MySQL driver

wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.46.zip && unzip mysql-connector-java-5.1.46.zip && \
Writing a configuration file Logstash-input-jdbc

Using the LOGSTASH-INPUT-JDBC plug-in to read the MySQL data, this plug-in works relatively simple, is to execute a SQL on a timed basis, and then write the results of SQL execution to the stream, the incremental gain is not synchronized by Binlog mode, Instead, an increment field is used as a condition to query, each time the location of the current query is recorded, because of the increment of the characteristics, only need to query than the current large record can get all the increment of this period of time, the general increment field has two kinds, AUTO_INCREMENT the primary key id and ON UPDATE CURRENT_TIMESTAMP the update_time field, id Fields are only available for tables that are not updated and update_time are more generic, and it is recommended that you add a field to the MySQL table design. update_time

input {jdbc {jdbc_driver_library =  Jdbc_driver_class =  "com.mysql.jdbc.Driver"  jdbc_connection_string =  jdbc_user =  Jdbc_password =  Schedule +  "* * * * *"  statement =  "SELECT * FROM table WHERE update_time >: Sql_last_value"  use_column_value = true< /span> Tracking_column_type =  "timestamp"  tracking_column =  Last_run_metadata_path = }}  
    • jdbc_driver_library: JDBC MySQL-driven path, downloaded in previous step
    • jdbc_driver_class: The name of the driver class, MySQL fill com.mysql.jdbc.Driver it out
    • jdbc_connection_string: MySQL Address
    • jdbc_user: MySQL User
    • jdbc_password: MySQL Password
    • schedule: Execute SQL timing, similar to crontab scheduling
    • statement: The SQL to execute, starting with ":" is a defined variable, you can set the variable by parameters, here sql_last_value is a built-in variable that represents the value of Update_time in the last SQL execution
    • use_column_value: Use the Increment column value
    • tracking_column_type: Increment the type of a field, numeric representing a numeric type, representing a timestamp timestamp type
    • tracking_column: Increments the name of the field, where the Update_time column is used, and the type of this column istimestamp
    • last_run_metadata_path: Synchronization point file, this file records the last synchronization point, the restart will read the file, this file can be manually modified
Logstash-output-elasticsearch
output {  elasticsearch {    hosts => ["172.31.22.165""172.31.17.241""172.31.30.84""172.31.18.178"]    "<user>"    "<password>"    "table"    "%{id}"  }}
    • hosts: ES cluster address
    • user: ES user Name
    • password: es password
    • index: Import to the index name in es, where I set the name of the MySQL table directly
    • document_id: The document ID imported into ES, which needs to be set to primary key, otherwise the same record will be updated in ES with two records indicating the %{id} value of the field referenced in the MySQL table id
Run

Save the above code in a configuration file sync_table.cfg , execute the following command

cd logstash-6.2.3 && bin/logstash -f config/sync_table.cfg

If successful, the SQL statement executed on the standard output output will be

[2018-04-14T18:12:00,278][INFO ][logstash.inputs.jdbc     ] (0.001011s) SELECT version()[2018-04-14T18:12:00,284][INFO ][logstash.inputs.jdbc     ] (0.000723s) SELECT * FROM table WHERE update_time > ‘2018-04-14 17:55:00‘
Other issues multi-table synchronization

A Logstash instance can synchronize multiple tables with the pipelines mechanism, just write multiple configuration files, assuming we have two tables table1 and table2, corresponding to two configuration files sync_table1.cfg andsync_table2.cfg

In the config/pipelines.yml configure

- pipeline.id: table1  path.config: "config/sync_table1.cfg"- pipeline.id: table2  path.config: "config/sync_table2.cfg"

Direct bin/logstash Start

@timestampField

By default, the field @timestamp is the field added by Logstash-input-jdbc, and the default is the current time, which is useful when analyzing data, but sometimes we want to use some fields in the data to specify this field, this time we can use Filter.date, This plugin is dedicated to setting @timestamp this field.

Like I have I want to timeslice be represented by @timestamp a field, timeslice is a string, formatted as%Y%m%d%H%M

filter {  date {    "timeslice""yyyyMMddHHmm" ]    "Asia/Shanghai"  }}

Add this section of the configuration to the sync_table.cfg now @timestamp and the timeslice same

Reference links
    • LOGSTASH-INPUT-JDBC plug-in: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html
    • Logstash-output-elasticsearch plug-in: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html
    • Logstash-multiple-piplines:https://www.elastic.co/blog/logstash-multiple-pipelines
    • Logstash-filter-date plug-in: https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html

Reprint please indicate the source
This article link: http://www.hatlonely.com/2018/04/14/logstash-mysql-%E5%87%86%E5%AE%9E%E6%97%B6%E5%90%8C%E6%AD%A5%E5%88% b0-elasticsearch/

Logstash MySQL quasi real-time sync to Elasticsearch

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.