HBase Data Synchronization Elasticsearch The program

Source: Internet
Author: User
Tags solr


River Machinery of Elasticsearch

The elasticsearch itself provides the river machinery for synchronizing data.

Here, you can find the official recommended River now:

http://www.elasticsearch.org/guide/en/elasticsearch/rivers/current/

But the government did not provide HBase river.

In fact, Es River is very easy, is a user packaged jar package, ES is responsible for finding a node. and start the river. Assuming node fails, it will take the initiative to find another node to start the river.

Public interface Rivercomponent {    rivername rivername ();}  Public interface River extends Rivercomponent {    /**     * Called whenever the river was registered on a node, which can happen when:     * 1) The river _meta document gets indexed     * 2) an already registered river gets started on a node
   */    void Start ();    /**     * Called when the river was closed on a node, which can happen when:     * 1) The river was deleted by deleting its Type through the Delete mapping API     * 2) the node where the river is allocated are shut down or the river gets reroute D to another node     *    /Void Close ();}

Elasticsearch-hbase-river

There are two related items on GitHub:

Https://github.com/mallocator/Elasticsearch-HBase-River

This project is actually very easy, in the river with a timer to start a hbase scanner, to scan the data. and insert the data into ES. And you manually write code to scan almost the same.

Https://github.com/posix4e/Elasticsearch-HBase-River

This project uses the replication mechanism of hbase, simulates a node of hbase replication, and then synchronizes the data into ES.

However, this project is based on Hbase0.94 and has limited functionality.

The APIs for Hbase0.94 and HBase0.98 vary greatly and are largely unavailable. And the author also said that it could not be used in production environment.

The relication mechanism of hbase

Some blog posts that are available for official documentation and Cloudera:
Http://hbase.apache.org/book.html#cluster_replication
http://blog.cloudera.com/blog/2012/07/hbase-replication-overview-2/

The relication mechanism of hbase, in fact, is very much like the synchronization mechanism of MySQL. Every region server in HBase has a Wal Log, when Put/delete. will be written to the Wal log first.

Then the backend thread will randomly send the Wal log to Slave's region Server. The slave region server records where it is synced to on zookeeper.


HBase synchronizing data to SOLR scenario: Lily HBase Indexer

Cloudera built-in Cloudera search is actually this lily Hbase Indexer:

Https://github.com/NGDATA/hbase-indexer

This project takes advantage of the replication function of HBase. The HBase data Churn (put,delete) is pumped into a series of event, which can then be synced to SOLR.

This project abstracts out a subproject: HBase side-effect Processor.

Https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/README.md

Allows users to write their own listener to handle the event.


The solution to the HBase data synchronization to Elasticsearch

Considering the above, so decided to write a simple program based on HBase side-effect Processor to synchronize data into ES.

In fact the code is very easy, the Loggingconsumer in the demo will be good.

Https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-demo/src/main/java/com/ngdata/sep/demo /loggingconsumer.java

    private static class EventLogger implements EventListener {        @Override public        void Processevents (list< Sepevent> sepevents) {for            (sepevent sepevent:sepevents) {                System.out.println ("Received event:");                SYSTEM.OUT.PRINTLN ("  table =" + bytes.tostring (sepevent.gettable ()));                System.out.println ("  row =" + bytes.tostring (Sepevent.getrow ()));                SYSTEM.OUT.PRINTLN ("  payload =" + bytes.tostring (Sepevent.getpayload ()));                System.out.println ("  Key values =");                For (KeyValue kv:sepEvent.getKeyValues ()) {                    System.out.println ("    " + kv.tostring ());                }            }        }    }


Some other stuff: ElasticSearch and SOLR Cloud comparison

From the online posts found, the discussion is more than 12, looks like the back is less.

Https://github.com/superkelvint/solr-vs-elasticsearch
Http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage

Http://www.quora.com/Why-Cloudera-search-is-built-on-Solr-and-not-Elasticsearch Why Cloudera-search Choose SOLR instead of Elasticsearch


Individuals tend to elasticsearch, because from a popularity perspective, ES is moving beyond SOLR cloud:

watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvagvuz3l1bmfiyw==/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/center ">

Logstash + ElasticSearch + Kibana full log collection analysis tool chain. There are very many companies in use.



Copyright notice: This article Bo Master original articles, blogs, without consent, may not be reproduced.

HBase Data Synchronization Elasticsearch The program

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.