River Machinery of Elasticsearch
The elasticsearch itself provides the river machinery for synchronizing data.
Here, you can find the official recommended River now:
http://www.elasticsearch.org/guide/en/elasticsearch/rivers/current/
But the government did not provide HBase river.
In fact, Es River is very easy, is a user packaged jar package, ES is responsible for finding a node. and start the river. Assuming node fails, it will take the initiative to find another node to start the river.
Public interface Rivercomponent { rivername rivername ();} Public interface River extends Rivercomponent { /** * Called whenever the river was registered on a node, which can happen when: * 1) The river _meta document gets indexed * 2) an already registered river gets started on a node
*/ void Start (); /** * Called when the river was closed on a node, which can happen when: * 1) The river was deleted by deleting its Type through the Delete mapping API * 2) the node where the river is allocated are shut down or the river gets reroute D to another node * /Void Close ();}
Elasticsearch-hbase-river
There are two related items on GitHub:
Https://github.com/mallocator/Elasticsearch-HBase-River
This project is actually very easy, in the river with a timer to start a hbase scanner, to scan the data. and insert the data into ES. And you manually write code to scan almost the same.
Https://github.com/posix4e/Elasticsearch-HBase-River
This project uses the replication mechanism of hbase, simulates a node of hbase replication, and then synchronizes the data into ES.
However, this project is based on Hbase0.94 and has limited functionality.
The APIs for Hbase0.94 and HBase0.98 vary greatly and are largely unavailable. And the author also said that it could not be used in production environment.
The relication mechanism of hbase
Some blog posts that are available for official documentation and Cloudera:
Http://hbase.apache.org/book.html#cluster_replication
http://blog.cloudera.com/blog/2012/07/hbase-replication-overview-2/
The relication mechanism of hbase, in fact, is very much like the synchronization mechanism of MySQL. Every region server in HBase has a Wal Log, when Put/delete. will be written to the Wal log first.
Then the backend thread will randomly send the Wal log to Slave's region Server. The slave region server records where it is synced to on zookeeper.
HBase synchronizing data to SOLR scenario: Lily HBase Indexer
Cloudera built-in Cloudera search is actually this lily Hbase Indexer:
Https://github.com/NGDATA/hbase-indexer
This project takes advantage of the replication function of HBase. The HBase data Churn (put,delete) is pumped into a series of event, which can then be synced to SOLR.
This project abstracts out a subproject: HBase side-effect Processor.
Https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/README.md
Allows users to write their own listener to handle the event.
The solution to the HBase data synchronization to Elasticsearch
Considering the above, so decided to write a simple program based on HBase side-effect Processor to synchronize data into ES.
In fact the code is very easy, the Loggingconsumer in the demo will be good.
Https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-demo/src/main/java/com/ngdata/sep/demo /loggingconsumer.java
private static class EventLogger implements EventListener { @Override public void Processevents (list< Sepevent> sepevents) {for (sepevent sepevent:sepevents) { System.out.println ("Received event:"); SYSTEM.OUT.PRINTLN (" table =" + bytes.tostring (sepevent.gettable ())); System.out.println (" row =" + bytes.tostring (Sepevent.getrow ())); SYSTEM.OUT.PRINTLN (" payload =" + bytes.tostring (Sepevent.getpayload ())); System.out.println (" Key values ="); For (KeyValue kv:sepEvent.getKeyValues ()) { System.out.println (" " + kv.tostring ()); } } } }
Some other stuff: ElasticSearch and SOLR Cloud comparison
From the online posts found, the discussion is more than 12, looks like the back is less.
Https://github.com/superkelvint/solr-vs-elasticsearch
Http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage
Http://www.quora.com/Why-Cloudera-search-is-built-on-Solr-and-not-Elasticsearch Why Cloudera-search Choose SOLR instead of Elasticsearch
Individuals tend to elasticsearch, because from a popularity perspective, ES is moving beyond SOLR cloud:
watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvagvuz3l1bmfiyw==/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/ Dissolve/70/gravity/center ">
Logstash + ElasticSearch + Kibana full log collection analysis tool chain. There are very many companies in use.
Copyright notice: This article Bo Master original articles, blogs, without consent, may not be reproduced.
HBase Data Synchronization Elasticsearch The program