HBase Data Synchronization Elasticsearch The program

Last Update:2015-08-25 Source: Internet

Author: User

Tags solr

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

River Machinery of Elasticsearch

The elasticsearch itself provides the river machinery for synchronizing data.

Here, you can find the official recommended River now:

http://www.elasticsearch.org/guide/en/elasticsearch/rivers/current/

But the government did not provide HBase river.

In fact, Es River is very easy, is a user packaged jar package, ES is responsible for finding a node. and start the river. Assuming node fails, it will take the initiative to find another node to start the river.

Public interface Rivercomponent {    rivername rivername ();}  Public interface River extends Rivercomponent {    /**     * Called whenever the river was registered on a node, which can happen when:     * 1) The river _meta document gets indexed     * 2) an already registered river gets started on a node
   */    void Start ();    /**     * Called when the river was closed on a node, which can happen when:     * 1) The river was deleted by deleting its Type through the Delete mapping API     * 2) the node where the river is allocated are shut down or the river gets reroute D to another node     *    /Void Close ();}

Elasticsearch-hbase-river

There are two related items on GitHub:

Https://github.com/mallocator/Elasticsearch-HBase-River

This project is actually very easy, in the river with a timer to start a hbase scanner, to scan the data. and insert the data into ES. And you manually write code to scan almost the same.

Https://github.com/posix4e/Elasticsearch-HBase-River

This project uses the replication mechanism of hbase, simulates a node of hbase replication, and then synchronizes the data into ES.

However, this project is based on Hbase0.94 and has limited functionality.

The APIs for Hbase0.94 and HBase0.98 vary greatly and are largely unavailable. And the author also said that it could not be used in production environment.

The relication mechanism of hbase

Some blog posts that are available for official documentation and Cloudera:
Http://hbase.apache.org/book.html#cluster_replication
http://blog.cloudera.com/blog/2012/07/hbase-replication-overview-2/

The relication mechanism of hbase, in fact, is very much like the synchronization mechanism of MySQL. Every region server in HBase has a Wal Log, when Put/delete. will be written to the Wal log first.

Then the backend thread will randomly send the Wal log to Slave's region Server. The slave region server records where it is synced to on zookeeper.

HBase synchronizing data to SOLR scenario: Lily HBase Indexer

Cloudera built-in Cloudera search is actually this lily Hbase Indexer:

Https://github.com/NGDATA/hbase-indexer

This project takes advantage of the replication function of HBase. The HBase data Churn (put,delete) is pumped into a series of event, which can then be synced to SOLR.

This project abstracts out a subproject: HBase side-effect Processor.

Https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/README.md

Allows users to write their own listener to handle the event.

The solution to the HBase data synchronization to Elasticsearch

Considering the above, so decided to write a simple program based on HBase side-effect Processor to synchronize data into ES.

In fact the code is very easy, the Loggingconsumer in the demo will be good.

Https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-demo/src/main/java/com/ngdata/sep/demo /loggingconsumer.java

    private static class EventLogger implements EventListener {        @Override public        void Processevents (list< Sepevent> sepevents) {for            (sepevent sepevent:sepevents) {                System.out.println ("Received event:");                SYSTEM.OUT.PRINTLN ("  table =" + bytes.tostring (sepevent.gettable ()));                System.out.println ("  row =" + bytes.tostring (Sepevent.getrow ()));                SYSTEM.OUT.PRINTLN ("  payload =" + bytes.tostring (Sepevent.getpayload ()));                System.out.println ("  Key values =");                For (KeyValue kv:sepEvent.getKeyValues ()) {                    System.out.println ("    " + kv.tostring ());                }            }        }    }

Some other stuff: ElasticSearch and SOLR Cloud comparison

From the online posts found, the discussion is more than 12, looks like the back is less.

Https://github.com/superkelvint/solr-vs-elasticsearch
Http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage

Http://www.quora.com/Why-Cloudera-search-is-built-on-Solr-and-not-Elasticsearch Why Cloudera-search Choose SOLR instead of Elasticsearch

Individuals tend to elasticsearch, because from a popularity perspective, ES is moving beyond SOLR cloud:

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvagvuz3l1bmfiyw==/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/center ">

Logstash + ElasticSearch + Kibana full log collection analysis tool chain. There are very many companies in use.

HBase Data Synchronization Elasticsearch The program

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More