MySQL as a mature and stable data persistence solution, widely used in various fields, but in the data analysis of a little bit, and Elasticsearch as the leader in the field of data analysis, just can compensate for this deficiency, and we need to do is to synchronize the data in MySQL to Elasticsearch, and Logstash just can support, all you need to do is write a configuration file
Logstash get
Get Logstash
wget https://artifacts.elastic.co/downloads/logstash/logstash-6.2.3.zipunzip logstash-6.2.3.zip && cd logstash-6.2.3
Installing the JDBC and Elasticsearch plugins
bin/logstash-plugin install logstash-input-jdbc && bin/logstash-plugin install logstash-output-elasticsearch; \
Get JDBC MySQL driver
wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.46.zip && unzip mysql-connector-java-5.1.46.zip && \
Writing a configuration file Logstash-input-jdbc
Using the LOGSTASH-INPUT-JDBC plug-in to read the MySQL data, this plug-in works relatively simple, is to execute a SQL on a timed basis, and then write the results of SQL execution to the stream, the incremental gain is not synchronized by Binlog mode, Instead, an increment field is used as a condition to query, each time the location of the current query is recorded, because of the increment of the characteristics, only need to query than the current large record can get all the increment of this period of time, the general increment field has two kinds, AUTO_INCREMENT
the primary key id
and ON UPDATE CURRENT_TIMESTAMP
the update_time
field, id
Fields are only available for tables that are not updated and update_time
are more generic, and it is recommended that you add a field to the MySQL table design. update_time
input {jdbc {jdbc_driver_library = Jdbc_driver_class = "com.mysql.jdbc.Driver" jdbc_connection_string = jdbc_user = Jdbc_password = Schedule + "* * * * *" statement = "SELECT * FROM table WHERE update_time >: Sql_last_value" use_column_value = true< /span> Tracking_column_type = "timestamp" tracking_column = Last_run_metadata_path = }}
jdbc_driver_library
: JDBC MySQL-driven path, downloaded in previous step
jdbc_driver_class
: The name of the driver class, MySQL fill com.mysql.jdbc.Driver
it out
jdbc_connection_string
: MySQL Address
jdbc_user
: MySQL User
jdbc_password
: MySQL Password
schedule
: Execute SQL timing, similar to crontab scheduling
statement
: The SQL to execute, starting with ":" is a defined variable, you can set the variable by parameters, here sql_last_value
is a built-in variable that represents the value of Update_time in the last SQL execution
use_column_value
: Use the Increment column value
tracking_column_type
: Increment the type of a field, numeric
representing a numeric type, representing a timestamp timestamp
type
tracking_column
: Increments the name of the field, where the Update_time column is used, and the type of this column istimestamp
last_run_metadata_path
: Synchronization point file, this file records the last synchronization point, the restart will read the file, this file can be manually modified
Logstash-output-elasticsearch
output { elasticsearch { hosts => ["172.31.22.165""172.31.17.241""172.31.30.84""172.31.18.178"] "<user>" "<password>" "table" "%{id}" }}
hosts
: ES cluster address
user
: ES user Name
password
: es password
index
: Import to the index name in es, where I set the name of the MySQL table directly
document_id
: The document ID imported into ES, which needs to be set to primary key, otherwise the same record will be updated in ES with two records indicating the %{id}
value of the field referenced in the MySQL table id
Run
Save the above code in a configuration file sync_table.cfg
, execute the following command
cd logstash-6.2.3 && bin/logstash -f config/sync_table.cfg
If successful, the SQL statement executed on the standard output output will be
[2018-04-14T18:12:00,278][INFO ][logstash.inputs.jdbc ] (0.001011s) SELECT version()[2018-04-14T18:12:00,284][INFO ][logstash.inputs.jdbc ] (0.000723s) SELECT * FROM table WHERE update_time > ‘2018-04-14 17:55:00‘
Other issues multi-table synchronization
A Logstash instance can synchronize multiple tables with the pipelines mechanism, just write multiple configuration files, assuming we have two tables table1 and table2, corresponding to two configuration files sync_table1.cfg
andsync_table2.cfg
In the config/pipelines.yml
configure
- pipeline.id: table1 path.config: "config/sync_table1.cfg"- pipeline.id: table2 path.config: "config/sync_table2.cfg"
Direct bin/logstash
Start
@timestamp
Field
By default, the field @timestamp
is the field added by Logstash-input-jdbc, and the default is the current time, which is useful when analyzing data, but sometimes we want to use some fields in the data to specify this field, this time we can use Filter.date, This plugin is dedicated to setting @timestamp
this field.
Like I have I want to timeslice
be represented by @timestamp
a field, timeslice
is a string, formatted as%Y%m%d%H%M
filter { date { "timeslice""yyyyMMddHHmm" ] "Asia/Shanghai" }}
Add this section of the configuration to the sync_table.cfg
now @timestamp
and the timeslice
same
Reference links
- LOGSTASH-INPUT-JDBC plug-in: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html
- Logstash-output-elasticsearch plug-in: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html
- Logstash-multiple-piplines:https://www.elastic.co/blog/logstash-multiple-pipelines
- Logstash-filter-date plug-in: https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html
Reprint please indicate the source
This article link: http://www.hatlonely.com/2018/04/14/logstash-mysql-%E5%87%86%E5%AE%9E%E6%97%B6%E5%90%8C%E6%AD%A5%E5%88% b0-elasticsearch/
Logstash MySQL quasi real-time sync to Elasticsearch