Using hive to read and write data from Elasticsearch

Last Update:2018-07-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original link: http://lxw1234.com/archives/2015/12/585.htm

Keywords: hive, elasticsearch, integration, consolidation

Elasticsearch can already be used with big data technology frameworks like yarn, Hadoop, Hive, Pig, Spark, Flume, and more, especially when adding data, using distributed tasks to add index data, especially on data platforms. Many of the data is stored in hive, and using hive to manipulate the data in Elasticsearch will be a great convenience for developers. Here is a record of how hive integrates with Elasticsearch, querying and adding data to the configuration usage process. Based on Hive0.13.1, hadoop-cdh5.0, ElasticSearch 2.1.0. reading and statistical analysis of data in Elasticsearch through Hive data already in Elasticsearch

_index:lxw1234
_type:tags
_ID: User ID (Cookieid)
Fields: Area, media_view_tags, interest

Hive Build Table

Because I use the Elasticsearch version of 2.1.0, you must use elasticsearch-hadoop-2.2.0 to support it, and if the ES version is less than 2.1.0, you can use elasticsearch-hadoop-2.1.2.

Download Address: Https://www.elastic.co/downloads/hadoop

Add Jar File:///home/liuxiaowen/elasticsearch-hadoop-2.2.0-beta1/dist/elasticsearch-hadoop-hive-2.2.0-beta1.jar; CREATE EXTERNAL TABLE lxw1234_es_tags (Cookieid string, area string, media_view_tags string, interest string) STORED by ' Org.elasticsearch.hadoop.hive.EsStorageHandler ' tblproperties (' es.nodes ' = ' 172.16.212.17:9200,172.16.212.102:9200 ', ' es.index.auto.create ' = ' false ', ' es.resource ' = ' lxw1234/tags ', ' Es.read.metadata ' = ' true ', ' es.mapping.names ' = ' cookieid:_metadata._id, Area:area, Media_view_tags:media_view_tags, Interest:interest ');

Note: Because Lxw1234/tags's _id is Cookieid in es, you must use this method to map _id to a hive table field:
' Es.read.metadata ' = ' true ',
' Es.mapping.names ' = ' cookieid:_metadata._id,... ' querying data in Hive

The data can be queried normally.

Execute select COUNT (1) from Lxw1234_es_tags; Hive is also performed via MapReduce, with each shard using a map task:

You can query only the filtered data by specifying the search criteria in the Hive external table. For example, the following build statement will search for _id=98e5d2de059f1d563d8565 records from ES:

CREATE EXTERNAL TABLE lxw1234_es_tags_2 (Cookieid string, area string, media_view_tags string, interest string) STORED B Y ' Org.elasticsearch.hadoop.hive.EsStorageHandler ' tblproperties (' es.nodes ' = ' 172.16.212.17:9200,172.16.212.102:9200 ', ' es.index.auto.create ' = ' false ', ' es.resource ' = ' lxw1234/tags ', ' Es.read.metadata ' = ' true ', ' es.mapping.names ' = ' cookieid:_metadata._id, Area:area, Media_view_tags:media_view_tags, Interest:interest ', ' es.query ' = '? q=_id:98e5d2de059f1d563d8565 '); Hive> select * from Lxw1234_es_tags_2; OK 98e5d2de059f1d563d8565 Sichuan | Chengdu Shopping | | shopping | | Time taken:0.096 seconds, fetched:1 row (s)

If the amount of data is small, you can use hive's local mode to do so without committing to the Hadoop cluster:

Set in hive:

Set hive.exec.mode.local.auto.inputbytes.max=134217728; Set hive.exec.mode.local.auto.tasks.max=10; Set hive.exec.mode.local.auto=true; Set fs.defaultfs=file:///; Hive> Select Area,count (1) as CNT from Lxw1234_es_tags Group by area ORDER BY CNT DESC limit 20; Automatically selecting local only mode for query total jobs = 2 Launching Job 1 out of 2 ..... Execution log at:/tmp/liuxiaowen/liuxiaowen_20151211133030_97b50138-d55d-4a39-bc8e-cbdf09e33ee6.log Job running In-process (local Hadoop) Hadoop job information for Null:number of Mappers

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Using hive to read and write data from Elasticsearch

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Using hive to read and write data from Elasticsearch

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support