Large data architecture-configure storage and indexing with HBase and SOLR

Last Update:2014-12-18 Source: Internet

Author: User

Keywords nbsp OK name

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

HBase and SOLR can make requests to solr through a coprocessor coprocessor, and SOLR can synchronize the data that is received: adding, deleting, and indexing operations. Store and index on different machines, this is a large data architecture must be, but there are many students do not know this way, they are very new to this idea, however, this is definitely a good direction, so do not know how to study.

A friend gave me the blog message, said CDH can also do such things, I have not tried, he also asked me to be related to the code, so I have a little collation, as the main content of this article. About CDH, I will try as soon as possible, have know the classmate can give me a message.

The following is a major description of the code and explanations that I have written when I test the performance of HBase and SOLR using the HBase coprocessor to add data to hbase.

I. Write HBase coprocessor coprocessor

Once the data postput, the corresponding core update to SOLR is immediately available. Using Concurrentupdatesolrserver, which is a guarantee of SOLR rate performance, use it not to forget to configure Autocommit yo inside SOLR.

/*   * Copyright: Wang Anqi    * Description: Monitoring HBase, when a data postput is sent to SOLR, this class is added as a trigger to the hbase    * Modified: 2014-05-27   * content: New    */  package solrHbase.test;      import java.io.UnsupportedEncodingException;     import * **;     public class SorlIndexCoprocessorObserver extends  baseregionobserver {         private static final logger  log = loggerfactory              GetLogger ( Sorlindexcoprocessorobserver.class);      private static final string  solrurl =  "Http://192.1.11.108:80/solr/core1";      private static final  solrserver solrserver = new concurrentupdatesolrserver (              solrurl, 10000,&NBSP;20);        /**       *  set up SOLR index         *        *  @throws  UnsupportedEncodingException        */      @Override       public void  Postput (final observercontext<regioncoprocessorenvironment> e,              final put put, final waledit edit, final boolean  Writetowal)               throws unsupportedencodingexception {           INPUTSOLR (put);     }         PUBLIC&NBSP;VOID&NBSP;INPUTSOLR (put put)  {          try {               Solrserver.add (put) Testsolrmain.getinputdoc &NBSP;&Nbsp;       } catch  (Exception ex)  {              Log.error (Ex.getmessage ());         }     }   } 

NOTE: Getinputdoc is the essence of this hbase coprocessor coprocessor, which translates the contents of the put in HBase into the values that SOLR needs. where String fieldname = key.substring (Key.indexof (columnfamily) + 3, key.indexof (" I'm Here "). Trim (); Here is a garbled character, not seen here, please pay attention.

Public static solrinputdocument getinputdoc (put put)  {          Solrinputdocument doc = new solrinputdocument ();          Doc.addfield ("test_id",  bytes.tostring (Put.getrow ());          for  ( Keyvalue c : put.getfamilymap (Bytes.tobytes (columnfamily))  {              string key = bytes.tostring (C.getkey ());              string value = bytes.tostring (C.getvalue ());              if  (Value.isempty ())  {                  continue;             }              String fieldname = key.substring (Key.indexof (columnfamily) &NBSP;+&NBsp;3,                      Key.indexof (""). Trim ();               Doc.addfield (fieldname, value);         }          return doc;     } 

Second, write test program entry code main

This code builds a table to the HBase request and submits the simulated data to the hbase, continuously inserting the data into the HBase, recording the time and testing the insert performance.

/*   * Copyright: Wang Anqi    * Description: Test hbaseinsert,hbase Insert Performance    * modified: 2014-05-27    * Modifications: New    */  package solrHbase.test;      import hbaseinput.hbaseinsert;     import ***;     public  class TestHBaseMain {         private static  configuration config;      private static string tablename =  " Angelhbase ";      private static htable table = null;       private static final string columnfamily =  "Wanganqi";         /**       *  @param  args        */      Public static void main (String[] args)  {         Config = hbaseconfiguration.create ();          Config.set (" Hbase.zookeeper.quorum ", " 192.103.101.104 ");          hbaseinsert.createtable ( config, tablename, columnfamily);          try {              table = new htable (Config, bytes.tobytes (tablename));               for  (int k = 0; k < 1; k+ +)  {                  Thread t = new thread ()  {                      Public void run ()  {                          for  (int  i = 0; i < 100000; i++)  {                              hbaseinsert.inputdata (table,                                       Putcreater.createputs (1000, columnfamily));                              calendar c = calendar.getinstance ();                               String datetime  = c.get (calendar.year)  +  "-"                                       + c.get (calendar.month)  +  "-"                                        + c.get (calendar.date)  +  "T"                                       + c.get (calendar.hour)  +  ":"                                       + c.get (Calendar.minute)  +  ":"                                       + c.get (Calendar.second)  +  ":"                                       + c.get ( Calendar.millisecond)  +  "z  write: "                                       + i * 1000;                    &NBsp;         SYSTEM.OUT.PRINTLN (dateTime);                         }                     }                  };                  T.start ();             }         }  catch  (ioexception e1)  {              e1.printstacktrace ();          }     }     } 

Below is a hbase-related operation that encapsulates it in a class with only the relevant code for building tables and inserting data.

/*   * Copyright: Wang Anqi    * Description: HBase related operations, table and insert data    * modified time:2014-05-27    * Modification: New    */  package hbaseInput;  import ***;   import org.apache.hadoop.hbase.client.Put;     public class  hbaseinsert {         Public static void createtable ( configuration config, string tablename,              String  columnfamily)  {          hbaseadmin hbaseadmin;          try {              hbaseadmin = new  Hbaseadmin (config);              if  (hbaseadmin.tableexists (tablename))  {                  return;        &NBSP;    }              htabledescriptor tabledescriptor =  new htabledescriptor (tablename);              Tabledescriptor.addfamily (New hcolumndescriptor (columnfamily));              Hbaseadmin.createtable (tabledescriptor);              Hbaseadmin.close ();         } catch  (masternotrunningexception e)  {               e.printstacktrace ();         } catch  (zookeeperconnectionexception e)  {              E.printstacktrace ( );         } catch  (ioexception e)  {              e.printstacktrace ();         }&NBSp;    }         Public static void inputdata (HTable  table, arraylist<put> puts)  {          try {               Table.put (puts);              Table.flushcommits ();              puts.clear ();         } catch  (ioexception e)  {              E.printstacktrace ();         }     }  } 

Iii. compiling simulated data put

Writing data to a hbase requires constructing put, and here's how I'm going to construct the analog data, with the generation of strings, I was provided by the MMSEG dictionary words.dic Random read some words connected to generate a string, the following code does not show, but very easy, you create your own desired data on the OK.

Public static put createput (string columnfamily)  {          String ss = getsentence ();          byte[] family =  bytes.tobytes (columnfamily);          byte[] rowkey =  Bytes.tobytes (""  + math.abs (R.nextlong ()));          put put =  new put (Rowkey);          Put.add (family, bytes.tobytes),                   bytes.tobytes (""  + math.abs (R.nextint ()) );          ******          Put.add (family,  Bytes.tobytes ("company_mmsegsm"),  bytes.tobytes ("ss"));             return put;     } 

Of course, before running this program, you need to configure the column information you need in SOLR, HBase, SOLR installation and configuration, and their basic usage will be described in later articles. In this case, SOLR's column configuration is the same as the one you used to generate the createput, and of course you can use the form of dynamic columns.

Four, directly to SOLR performance test

If you don't want to test the combination of HBase and SOLR, just want to test the performance of SOLR alone, this is much simpler, you can use the code snippet above to test, just a little assembly.

private static void Sendconcurrentupdatesolrserver (final String URL, final int count) throws Solrserverexception, IOException {solrserver solrserver = new Concurrentupdatesolrserver (URL, 10000, 20); for (int i = 0; i < count; i++) {Solrserver.add (Getinputdoc (Putcreater.createput (columnfamily))); } }

Hope can help you strict specifications-Kung Fu home. This time the article code more than a point, but code is the best language to interpret ideas, and my promotion is to minimize the annotation of the code as much as possible, to simplify your code, to make your code clear enough to understand it, and even to be similar to pseudocode, which is advocated in the "refactoring" book.

Original link: http://www.cnblogs.com/wgp13x/p/3927979.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More