Elasticsearch Chinese API River

Source: Internet
Author: User

RIVER-JDBC Installation
./bin/plugin --install jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.4.0.8/elasticsearch-river-jdbc-1.4.0.8-plugin.zip
Document two ways: River or feeder

The plug-in can execute the river in "Pull mode" and execute feeder in "push mode". In feeder mode, the plug-in runs in a different JVM and can be connected to a remote Elasticsearch cluster.

The plug-in can obtain data in parallel from different relational database sources. Multithreaded bulk mode ensures high throughput when indexed to Elasticsearch.

Install Run River
#安装elasticsearchcurl -OL https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.2.zipcd $ES_HOMEunzip path/to/elasticsearch-1.4.2.zip#安装JDBC插件./bin/plugin --install jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.4.0.6/elasticsearch-river-jdbc-1.4.0.6-plugin.zip#下载mysql drivercurl -o mysql-connector-java-5.1.33.zip -L ‘http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.33.zip/from/http://cdn.mysql.com/‘cp mysql-connector-java-5.1.33-bin.jar $ES_HOME/plugins/jdbc/ chmod 644 $ES_HOME/plugins/jdbc/*#启动elasticsearch./bin/elasticsearch#停止rivercurl -XDELETE ‘localhost:9200/_river/my_jdbc_river/‘
JDBC Plug-in Parameters

The generic format for the JDBC plugin is as follows:

curl -XPUT ‘localhost:9200/_river/<rivername>/_meta‘ -d ‘{    <river parameters>    "type" : "jdbc",    "jdbc" : {         <river definition>    }}‘

For example

curl -XPUT ‘localhost:9200/_river/my_jdbc_river/_meta‘ -d ‘{    "type" : "jdbc",    "jdbc" : {        "url" : "jdbc:mysql://localhost:3306/test",        "user" : "",        "password" : "",        "sql" : "select * from orders",        "index" : "myindex",        "type" : "mytype",        ...    }}‘

Multiple river sources are also available if an array is passed to the JDBC field.

curl -XPUT ‘localhost:9200/_river/my_jdbc_river/_meta‘ -d ‘{     <river parameters>    "type" : "jdbc",    "jdbc" : [ {         <river definition 1>    }, {         <river definition 2>    } ]}‘

Multiple river concurrency sources can be controlled in parallel by parameters

curl -XPUT ‘localhost:9200/_river/my_jdbc_river/_meta‘ -d ‘{     <river parameters>    "concurrency" : 2,    "type" : "jdbc",    "jdbc" : [ {         <river definition 1>    }, {         <river definition 2>    } ]}‘
jdbcParameters outside the block

strategy-The JDBC plugin policy. The current implementation has the simple and column .

schedule-a single or a list of cron expressions for scheduled execution

threadpoolsize-scheduled executions thread pool size

interval-Two river Start delay time

max_bulk_actions-The length of each bulk index request submission (default is 1000)

max_concurrrent_bulk_requests-Number of concurrent bulk requests (default is 2*CPU core)

max_bulk_volume-Maximum capacity of a bulk request (default is 10m)

max_request_wait-Maximum wait time for a bulk request (default is 60s)

flush_interval-Flushing the time between the index document and the bulk action

jdbcParameters inside the Block

url-The JDBC driver URL

user-The JDBC database user

password-The JDBC database password

sql-SQL statement. Can be either a string or a list.

"sql" : [    {        "statement" : "select ... from ... where a = ?, b = ?, c = ?",        "parameter" : [ "value for a", "value for b", "value for c" ]    },    {        "statement" : "insert into  ... where a = ?, b = ?, c = ?",        "parameter" : [ "value for a", "value for b", "value for c" ],        "write" : "true"    },    {        "statement" : ...    }]

sql.statement-The SQL statement

sql.write-If the True,sql statement is interpreted as a insert/update statement, this statement writes permissions. Default is False

sql.callable-If the True,sql statement is interpreted as one CallableStatement for saving the stored procedure. Default is False

sql.parameter-Bind parameters to SQL statements. You can use some of the specified values

    • $now-the Current Timestamp
    • $job-A job counter
    • $count-last number of rows merged
    • $river. Name-the River Name
    • $last. sql.start-a timestamp value for the time of the last SQL statement started
    • $last. sql.end-a timestamp value for the time of the last SQL statement ended
    • $last. sql.sequence.start-a timestamp value for the time of the last SQL sequence started
    • $last. sql.sequence.end-a timestamp value for the time of the last SQL sequence ended
    • $river. state.started-the Timestamp of River start
    • $river. state.timestamp-last Timestamp of river activity
    • $river. State.counter-counter from River State, counts the numbers of runs

locale-The default locale (used for parsing numerical values, floating point character. Recommended values is "en_US")

timezone-The TimeZone for JDBC Settimestamp () calls if binding parameters with timestamp values

rounding-Rounding mode for parsing numeric values. Possible values ' ceiling ', ' down ', ' floor ', ' halfdown ', ' halfeven ', ' halfup ', ' unnecessary ', ' up '

scale-The precision of parsing numeric values

autocommit-true if each statement should is automatically executed. Default is False

fetchsize-The fetchsize for large result sets, most drivers use this to control the amount of rows in the buffer while iterating T Hrough the result set

max_rows-Limit the number of rows fetches by a statement, the rest of the rows are ignored

max_retries-The number of retries to (re) connect to a database

max_retries_wait-A time value for the time of the should be waited between retries. Default is "30s"

resultset_type-The JDBC result set type, can be type_forward_only, type_scroll_sensitive, type_scroll_insensitive. Default is type_forward_only

resultset_concurrency-The JDBC result set concurrency, can be concur_read_only, concur_updatable. Default is concur_updatable

ignore_null_values-If NULL values should is ignored when constructing JSON documents. Default is False

prepare_database_metadata-If the driver metadata should is prepared as parameters for acccess by the river. Default is False

prepare_resultset_metadata-If the result set metadata should is prepared as parameters for acccess by the river. Default is False

column_name_map-a map of aliases that should is used as a replacement for column names of the database. Useful for Oracle-char column name limit. Default is null

query_timeout-A second value for how long an SQL statement are allowed to being executed before it is considered as lost. Default is 1800

connection_properties-A map for the connection properties for driver connection creation. Default is null

index-the Elasticsearch index used for indexing

type-The Elasticsearch type of the index used for indexing

index_settings-Optional settings for the Elasticsearch index

type_mapping-optional mapping for the Elasticsearch index type

Default set of Parameters
{"strategy": "Simple", "schedule": null, "Interval": 0L, "ThreadPoolSize": 4, "max_bulk_actions": 100 XX, "max_concurrent_bulk_requests": 2 * Available CPU cores, "Max_bulk_volume": "10m", "max_request_wait": "60        S "," Flush_interval ":" 5s "," jdbc ": {" url ": null," user ": null," password ": null, "SQL": null, "locale": Locale.getdefault (). Tolanguagetag (), "timezone": Timezone.getdefault (), "Ro        Unding ": null," scale ": 2," autocommit ": false," fetchsize ": Ten,/* MySQL:Integer.MIN */ "Max_Rows": 0, "max_retries": 3, "max_retries_wait": "30s", "Resultset_type": "Type_forward_only" , "Resultset_concurreny": "Concur_updatable", "ignore_null_values": false, "Prepare_database_metada        Ta ": false," Prepare_resultset_metadata ": false," Column_name_map ": null," Query_timeout ": 1800, "Connection_propeRties ": null," index ":" JDBC "," type ":" JDBC "," index_settings ": null," type_mapping ": Nu LL,}}
Structured objects

One advantage of SQL queries is the connection operation. Get data from many tables to form a new tuple.

curl -XPUT ‘localhost:9200/_river/my_jdbc_river/_meta‘ -d ‘{    "type" : "jdbc",    "jdbc" : {        "url" : "jdbc:mysql://localhost:3306/test",        "user" : "",        "password" : "",        "sql" : "select \"relations\" as \"_index\", orders.customer as \"_id\", orders.customer as \"contact.customer\", employees.name as \"contact.employee\" from orders left join employees on employees.department = orders.department"    }}‘

SQL structure is

Mysql> Select "Relations" as "_index", Orders.customer as "_id", Orders.customer as "Contact.customer", Employees.name As "Contact.employee" from orders left JOIN employees on employees.department = orders.department;+-----------+-------+- -----------------+------------------+| _index | _id | Contact.customer | Contact.employee |+-----------+-------+------------------+------------------+| Relations | Big | Big | Smith | | Relations | Large | Large | Müller | | Relations | Large | Large | Meier | | Relations | Large | Large | Schulze | | Relations | Huge | Huge | Müller | | Relations | Huge | Huge | Meier | | Relations | Huge | Huge | Schulze | | Relations | Good | Good | Müller | | Relations | Good | Good | Meier | | Relations | Good | Good | Schulze | | Relations| Bad | Bad | Jones |+-----------+-------+------------------+------------------+11 rows in Set (0.00 sec)

The resulting JSON object is

index=relations id=Big {"contact":{"employee":"Smith","customer":"Big"}}index=relations id=Large {"contact":{"employee":["Müller","Meier","Schulze"],"customer":"Large"}}index=relations id=Huge {"contact":{"employee":["Müller","Meier","Schulze"],"customer":"Huge"}}index=relations id=Good {"contact":{"employee":["Müller","Meier","Schulze"],"customer":"Good"}}index=relations id=Bad {"contact":{"employee":"Jones","customer":"Bad"}}
How to get a table

It dumps a table into the Elasticsearch. If no column is given _id , IDs will be automatically generated.

curl -XPUT ‘localhost:9200/_river/my_jdbc_river/_meta‘ -d ‘{    "type" : "jdbc",    "jdbc" : {        "url" : "jdbc:mysql://localhost:3306/test",        "user" : "",        "password" : "",        "sql" : "select * from orders"    }}‘

The result is:

id=<random> {"product":"Apples","created":null,"department":"American Fruits","quantity":1,"customer":"Big"}id=<random> {"product":"Bananas","created":null,"department":"German Fruits","quantity":1,"customer":"Large"}id=<random> {"product":"Oranges","created":null,"department":"German Fruits","quantity":2,"customer":"Huge"}id=<random> {"product":"Apples","created":1338501600000,"department":"German Fruits","quantity":2,"customer":"Good"}id=<random> {"product":"Oranges","created":1338501600000,"department":"English Fruits","quantity":3,"customer":"Bad"}
How to get Incremental data

It is recommended to use timestamps to synchronize. The following example gets all the product rows that were added after the last River run.

{    "type" : "jdbc",    "jdbc" : {        "url" : "jdbc:mysql://localhost:3306/test",        "user" : "",        "password" : "",        "sql" : [            {                "statement" : "select * from \"products\" where \"mytimestamp\" > ?",                "parameter" : [ "$river.state.last_active_begin" ]            }        ],        "index" : "my_jdbc_river_index",        "type" : "my_jdbc_river_type"    }}

Elasticsearch Chinese API River

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.