HBase and SOLR can make requests to solr through a coprocessor coprocessor, and SOLR can synchronize the data that is received: adding, deleting, and indexing operations. Store and index on different machines, this is a large data architecture must be, but there are many students do not know this way, they are very new to this idea, however, this is definitely a good direction, so do not know how to study.
A friend gave me the blog message, said CDH can also do such things, I have not tried, he also asked me to be related to the code, so I have a little collation, as the main content of this article. About CDH, I will try as soon as possible, have know the classmate can give me a message.
The following is a major description of the code and explanations that I have written when I test the performance of HBase and SOLR using the HBase coprocessor to add data to hbase.
I. Write HBase coprocessor coprocessor
Once the data postput, the corresponding core update to SOLR is immediately available. Using Concurrentupdatesolrserver, which is a guarantee of SOLR rate performance, use it not to forget to configure Autocommit yo inside SOLR.
/* * Copyright: Wang Anqi * Description: Monitoring HBase, when a data postput is sent to SOLR, this class is added as a trigger to the hbase * Modified: 2014-05-27 * content: New */ package solrHbase.test; import java.io.UnsupportedEncodingException; import * **; public class SorlIndexCoprocessorObserver extends baseregionobserver { private static final logger log = loggerfactory GetLogger ( Sorlindexcoprocessorobserver.class); private static final string solrurl = "Http://192.1.11.108:80/solr/core1"; private static final solrserver solrserver = new concurrentupdatesolrserver ( solrurl, 10000,&NBSP;20); /** * set up SOLR index * * @throws UnsupportedEncodingException */ @Override public void Postput (final observercontext<regioncoprocessorenvironment> e, final put put, final waledit edit, final boolean Writetowal) throws unsupportedencodingexception { INPUTSOLR (put); } PUBLIC&NBSP;VOID&NBSP;INPUTSOLR (put put) { try { Solrserver.add (put) Testsolrmain.getinputdoc &NBSP;&Nbsp; } catch (Exception ex) { Log.error (Ex.getmessage ()); } } }
NOTE: Getinputdoc is the essence of this hbase coprocessor coprocessor, which translates the contents of the put in HBase into the values that SOLR needs. where String fieldname = key.substring (Key.indexof (columnfamily) + 3, key.indexof (" I'm Here "). Trim (); Here is a garbled character, not seen here, please pay attention.
Public static solrinputdocument getinputdoc (put put) { Solrinputdocument doc = new solrinputdocument (); Doc.addfield ("test_id", bytes.tostring (Put.getrow ()); for ( Keyvalue c : put.getfamilymap (Bytes.tobytes (columnfamily)) { string key = bytes.tostring (C.getkey ()); string value = bytes.tostring (C.getvalue ()); if (Value.isempty ()) { continue; } String fieldname = key.substring (Key.indexof (columnfamily) &NBSP;+&NBsp;3, Key.indexof (""). Trim (); Doc.addfield (fieldname, value); } return doc; }
Second, write test program entry code main
This code builds a table to the HBase request and submits the simulated data to the hbase, continuously inserting the data into the HBase, recording the time and testing the insert performance.
/* * Copyright: Wang Anqi * Description: Test hbaseinsert,hbase Insert Performance * modified: 2014-05-27 * Modifications: New */ package solrHbase.test; import hbaseinput.hbaseinsert; import ***; public class TestHBaseMain { private static configuration config; private static string tablename = " Angelhbase "; private static htable table = null; private static final string columnfamily = "Wanganqi"; /** * @param args */ Public static void main (String[] args) { Config = hbaseconfiguration.create (); Config.set (" Hbase.zookeeper.quorum ", " 192.103.101.104 "); hbaseinsert.createtable ( config, tablename, columnfamily); try { table = new htable (Config, bytes.tobytes (tablename)); for (int k = 0; k < 1; k+ +) { Thread t = new thread () { Public void run () { for (int i = 0; i < 100000; i++) { hbaseinsert.inputdata (table, Putcreater.createputs (1000, columnfamily)); calendar c = calendar.getinstance (); String datetime = c.get (calendar.year) + "-" + c.get (calendar.month) + "-" + c.get (calendar.date) + "T" + c.get (calendar.hour) + ":" + c.get (Calendar.minute) + ":" + c.get (Calendar.second) + ":" + c.get ( Calendar.millisecond) + "z write: " + i * 1000; &NBsp; SYSTEM.OUT.PRINTLN (dateTime); } } }; T.start (); } } catch (ioexception e1) { e1.printstacktrace (); } } }
Below is a hbase-related operation that encapsulates it in a class with only the relevant code for building tables and inserting data.
/* * Copyright: Wang Anqi * Description: HBase related operations, table and insert data * modified time:2014-05-27 * Modification: New */ package hbaseInput; import ***; import org.apache.hadoop.hbase.client.Put; public class hbaseinsert { Public static void createtable ( configuration config, string tablename, String columnfamily) { hbaseadmin hbaseadmin; try { hbaseadmin = new Hbaseadmin (config); if (hbaseadmin.tableexists (tablename)) { return; &NBSP; } htabledescriptor tabledescriptor = new htabledescriptor (tablename); Tabledescriptor.addfamily (New hcolumndescriptor (columnfamily)); Hbaseadmin.createtable (tabledescriptor); Hbaseadmin.close (); } catch (masternotrunningexception e) { e.printstacktrace (); } catch (zookeeperconnectionexception e) { E.printstacktrace ( ); } catch (ioexception e) { e.printstacktrace (); }&NBSp; } Public static void inputdata (HTable table, arraylist<put> puts) { try { Table.put (puts); Table.flushcommits (); puts.clear (); } catch (ioexception e) { E.printstacktrace (); } } }
Iii. compiling simulated data put
Writing data to a hbase requires constructing put, and here's how I'm going to construct the analog data, with the generation of strings, I was provided by the MMSEG dictionary words.dic Random read some words connected to generate a string, the following code does not show, but very easy, you create your own desired data on the OK.
Public static put createput (string columnfamily) { String ss = getsentence (); byte[] family = bytes.tobytes (columnfamily); byte[] rowkey = Bytes.tobytes ("" + math.abs (R.nextlong ())); put put = new put (Rowkey); Put.add (family, bytes.tobytes), bytes.tobytes ("" + math.abs (R.nextint ()) ); ****** Put.add (family, Bytes.tobytes ("company_mmsegsm"), bytes.tobytes ("ss")); return put; }
Of course, before running this program, you need to configure the column information you need in SOLR, HBase, SOLR installation and configuration, and their basic usage will be described in later articles. In this case, SOLR's column configuration is the same as the one you used to generate the createput, and of course you can use the form of dynamic columns.
Four, directly to SOLR performance test
If you don't want to test the combination of HBase and SOLR, just want to test the performance of SOLR alone, this is much simpler, you can use the code snippet above to test, just a little assembly.
private static void Sendconcurrentupdatesolrserver (final String URL, final int count) throws Solrserverexception, IOException {solrserver solrserver = new Concurrentupdatesolrserver (URL, 10000, 20); for (int i = 0; i < count; i++) {Solrserver.add (Getinputdoc (Putcreater.createput (columnfamily))); } }
Hope can help you strict specifications-Kung Fu home. This time the article code more than a point, but code is the best language to interpret ideas, and my promotion is to minimize the annotation of the code as much as possible, to simplify your code, to make your code clear enough to understand it, and even to be similar to pseudocode, which is advocated in the "refactoring" book.
Original link: http://www.cnblogs.com/wgp13x/p/3927979.html