RIVER-JDBC Installation
./bin/plugin --install jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.4.0.8/elasticsearch-river-jdbc-1.4.0.8-plugin.zip
Document two ways: River or feeder
The plug-in can execute the river in "Pull mode" and execute feeder in "push mode". In feeder mode, the plug-in runs in a different JVM and can be connected to a remote Elasticsearch cluster.
The plug-in can obtain data in parallel from different relational database sources. Multithreaded bulk mode ensures high throughput when indexed to Elasticsearch.
Install Run River
#安装elasticsearchcurl -OL https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.2.zipcd $ES_HOMEunzip path/to/elasticsearch-1.4.2.zip#安装JDBC插件./bin/plugin --install jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.4.0.6/elasticsearch-river-jdbc-1.4.0.6-plugin.zip#下载mysql drivercurl -o mysql-connector-java-5.1.33.zip -L ‘http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.33.zip/from/http://cdn.mysql.com/‘cp mysql-connector-java-5.1.33-bin.jar $ES_HOME/plugins/jdbc/ chmod 644 $ES_HOME/plugins/jdbc/*#启动elasticsearch./bin/elasticsearch#停止rivercurl -XDELETE ‘localhost:9200/_river/my_jdbc_river/‘
JDBC Plug-in Parameters
The generic format for the JDBC plugin is as follows:
curl -XPUT ‘localhost:9200/_river/<rivername>/_meta‘ -d ‘{ <river parameters> "type" : "jdbc", "jdbc" : { <river definition> }}‘
For example
curl -XPUT ‘localhost:9200/_river/my_jdbc_river/_meta‘ -d ‘{ "type" : "jdbc", "jdbc" : { "url" : "jdbc:mysql://localhost:3306/test", "user" : "", "password" : "", "sql" : "select * from orders", "index" : "myindex", "type" : "mytype", ... }}‘
Multiple river sources are also available if an array is passed to the JDBC field.
curl -XPUT ‘localhost:9200/_river/my_jdbc_river/_meta‘ -d ‘{ <river parameters> "type" : "jdbc", "jdbc" : [ { <river definition 1> }, { <river definition 2> } ]}‘
Multiple river concurrency
sources can be controlled in parallel by parameters
curl -XPUT ‘localhost:9200/_river/my_jdbc_river/_meta‘ -d ‘{ <river parameters> "concurrency" : 2, "type" : "jdbc", "jdbc" : [ { <river definition 1> }, { <river definition 2> } ]}‘
jdbc
Parameters outside the block
strategy
-The JDBC plugin policy. The current implementation has the simple
and column
.
schedule
-a single or a list of cron expressions for scheduled execution
threadpoolsize
-scheduled executions thread pool size
interval
-Two river Start delay time
max_bulk_actions
-The length of each bulk index request submission (default is 1000)
max_concurrrent_bulk_requests
-Number of concurrent bulk requests (default is 2*CPU core)
max_bulk_volume
-Maximum capacity of a bulk request (default is 10m)
max_request_wait
-Maximum wait time for a bulk request (default is 60s)
flush_interval
-Flushing the time between the index document and the bulk action
jdbc
Parameters inside the Block
url
-The JDBC driver URL
user
-The JDBC database user
password
-The JDBC database password
sql
-SQL statement. Can be either a string or a list.
"sql" : [ { "statement" : "select ... from ... where a = ?, b = ?, c = ?", "parameter" : [ "value for a", "value for b", "value for c" ] }, { "statement" : "insert into ... where a = ?, b = ?, c = ?", "parameter" : [ "value for a", "value for b", "value for c" ], "write" : "true" }, { "statement" : ... }]
sql.statement
-The SQL statement
sql.write
-If the True,sql statement is interpreted as a insert/update statement, this statement writes permissions. Default is False
sql.callable
-If the True,sql statement is interpreted as one CallableStatement
for saving the stored procedure. Default is False
sql.parameter
-Bind parameters to SQL statements. You can use some of the specified values
- $now-the Current Timestamp
- $job-A job counter
- $count-last number of rows merged
- $river. Name-the River Name
- $last. sql.start-a timestamp value for the time of the last SQL statement started
- $last. sql.end-a timestamp value for the time of the last SQL statement ended
- $last. sql.sequence.start-a timestamp value for the time of the last SQL sequence started
- $last. sql.sequence.end-a timestamp value for the time of the last SQL sequence ended
- $river. state.started-the Timestamp of River start
- $river. state.timestamp-last Timestamp of river activity
- $river. State.counter-counter from River State, counts the numbers of runs
locale
-The default locale (used for parsing numerical values, floating point character. Recommended values is "en_US")
timezone
-The TimeZone for JDBC Settimestamp () calls if binding parameters with timestamp values
rounding
-Rounding mode for parsing numeric values. Possible values ' ceiling ', ' down ', ' floor ', ' halfdown ', ' halfeven ', ' halfup ', ' unnecessary ', ' up '
scale
-The precision of parsing numeric values
autocommit
-true if each statement should is automatically executed. Default is False
fetchsize
-The fetchsize for large result sets, most drivers use this to control the amount of rows in the buffer while iterating T Hrough the result set
max_rows
-Limit the number of rows fetches by a statement, the rest of the rows are ignored
max_retries
-The number of retries to (re) connect to a database
max_retries_wait
-A time value for the time of the should be waited between retries. Default is "30s"
resultset_type
-The JDBC result set type, can be type_forward_only, type_scroll_sensitive, type_scroll_insensitive. Default is type_forward_only
resultset_concurrency
-The JDBC result set concurrency, can be concur_read_only, concur_updatable. Default is concur_updatable
ignore_null_values
-If NULL values should is ignored when constructing JSON documents. Default is False
prepare_database_metadata
-If the driver metadata should is prepared as parameters for acccess by the river. Default is False
prepare_resultset_metadata
-If the result set metadata should is prepared as parameters for acccess by the river. Default is False
column_name_map
-a map of aliases that should is used as a replacement for column names of the database. Useful for Oracle-char column name limit. Default is null
query_timeout
-A second value for how long an SQL statement are allowed to being executed before it is considered as lost. Default is 1800
connection_properties
-A map for the connection properties for driver connection creation. Default is null
index
-the Elasticsearch index used for indexing
type
-The Elasticsearch type of the index used for indexing
index_settings
-Optional settings for the Elasticsearch index
type_mapping
-optional mapping for the Elasticsearch index type
Default set of Parameters
{"strategy": "Simple", "schedule": null, "Interval": 0L, "ThreadPoolSize": 4, "max_bulk_actions": 100 XX, "max_concurrent_bulk_requests": 2 * Available CPU cores, "Max_bulk_volume": "10m", "max_request_wait": "60 S "," Flush_interval ":" 5s "," jdbc ": {" url ": null," user ": null," password ": null, "SQL": null, "locale": Locale.getdefault (). Tolanguagetag (), "timezone": Timezone.getdefault (), "Ro Unding ": null," scale ": 2," autocommit ": false," fetchsize ": Ten,/* MySQL:Integer.MIN */ "Max_Rows": 0, "max_retries": 3, "max_retries_wait": "30s", "Resultset_type": "Type_forward_only" , "Resultset_concurreny": "Concur_updatable", "ignore_null_values": false, "Prepare_database_metada Ta ": false," Prepare_resultset_metadata ": false," Column_name_map ": null," Query_timeout ": 1800, "Connection_propeRties ": null," index ":" JDBC "," type ":" JDBC "," index_settings ": null," type_mapping ": Nu LL,}}
Structured objects
One advantage of SQL queries is the connection operation. Get data from many tables to form a new tuple.
curl -XPUT ‘localhost:9200/_river/my_jdbc_river/_meta‘ -d ‘{ "type" : "jdbc", "jdbc" : { "url" : "jdbc:mysql://localhost:3306/test", "user" : "", "password" : "", "sql" : "select \"relations\" as \"_index\", orders.customer as \"_id\", orders.customer as \"contact.customer\", employees.name as \"contact.employee\" from orders left join employees on employees.department = orders.department" }}‘
SQL structure is
Mysql> Select "Relations" as "_index", Orders.customer as "_id", Orders.customer as "Contact.customer", Employees.name As "Contact.employee" from orders left JOIN employees on employees.department = orders.department;+-----------+-------+- -----------------+------------------+| _index | _id | Contact.customer | Contact.employee |+-----------+-------+------------------+------------------+| Relations | Big | Big | Smith | | Relations | Large | Large | Müller | | Relations | Large | Large | Meier | | Relations | Large | Large | Schulze | | Relations | Huge | Huge | Müller | | Relations | Huge | Huge | Meier | | Relations | Huge | Huge | Schulze | | Relations | Good | Good | Müller | | Relations | Good | Good | Meier | | Relations | Good | Good | Schulze | | Relations| Bad | Bad | Jones |+-----------+-------+------------------+------------------+11 rows in Set (0.00 sec)
The resulting JSON object is
index=relations id=Big {"contact":{"employee":"Smith","customer":"Big"}}index=relations id=Large {"contact":{"employee":["Müller","Meier","Schulze"],"customer":"Large"}}index=relations id=Huge {"contact":{"employee":["Müller","Meier","Schulze"],"customer":"Huge"}}index=relations id=Good {"contact":{"employee":["Müller","Meier","Schulze"],"customer":"Good"}}index=relations id=Bad {"contact":{"employee":"Jones","customer":"Bad"}}
How to get a table
It dumps a table into the Elasticsearch. If no column is given _id
, IDs will be automatically generated.
curl -XPUT ‘localhost:9200/_river/my_jdbc_river/_meta‘ -d ‘{ "type" : "jdbc", "jdbc" : { "url" : "jdbc:mysql://localhost:3306/test", "user" : "", "password" : "", "sql" : "select * from orders" }}‘
The result is:
id=<random> {"product":"Apples","created":null,"department":"American Fruits","quantity":1,"customer":"Big"}id=<random> {"product":"Bananas","created":null,"department":"German Fruits","quantity":1,"customer":"Large"}id=<random> {"product":"Oranges","created":null,"department":"German Fruits","quantity":2,"customer":"Huge"}id=<random> {"product":"Apples","created":1338501600000,"department":"German Fruits","quantity":2,"customer":"Good"}id=<random> {"product":"Oranges","created":1338501600000,"department":"English Fruits","quantity":3,"customer":"Bad"}
How to get Incremental data
It is recommended to use timestamps to synchronize. The following example gets all the product rows that were added after the last River run.
{ "type" : "jdbc", "jdbc" : { "url" : "jdbc:mysql://localhost:3306/test", "user" : "", "password" : "", "sql" : [ { "statement" : "select * from \"products\" where \"mytimestamp\" > ?", "parameter" : [ "$river.state.last_active_begin" ] } ], "index" : "my_jdbc_river_index", "type" : "my_jdbc_river_type" }}
Elasticsearch Chinese API River