Large data architecture-configure storage and indexing with HBase and SOLR

Source: Internet
Author: User
Keywords nbsp OK name

HBase and SOLR can make requests to solr through a coprocessor coprocessor, and SOLR can synchronize the data that is received: adding, deleting, and indexing operations. Store and index on different machines, this is a large data architecture must be, but there are many students do not know this way, they are very new to this idea, however, this is definitely a good direction, so do not know how to study.

A friend gave me the blog message, said CDH can also do such things, I have not tried, he also asked me to be related to the code, so I have a little collation, as the main content of this article. About CDH, I will try as soon as possible, have know the classmate can give me a message.

The following is a major description of the code and explanations that I have written when I test the performance of HBase and SOLR using the HBase coprocessor to add data to hbase.

I. Write HBase coprocessor coprocessor

Once the data postput, the corresponding core update to SOLR is immediately available. Using Concurrentupdatesolrserver, which is a guarantee of SOLR rate performance, use it not to forget to configure Autocommit yo inside SOLR.

/*   * Copyright: Wang Anqi    * Description: Monitoring HBase, when a data postput is sent to SOLR, this class is added as a trigger to the hbase    * Modified: 2014-05-27   * content: New    */  package solrHbase.test;      import java.io.UnsupportedEncodingException;     import * **;     public class SorlIndexCoprocessorObserver extends  baseregionobserver {         private static final logger  log = loggerfactory              GetLogger ( Sorlindexcoprocessorobserver.class);      private static final string  solrurl =  "Http://192.1.11.108:80/solr/core1";      private static final  solrserver solrserver = new concurrentupdatesolrserver (              solrurl, 10000, 20);        /**       *  set up SOLR index         *        *  @throws  UnsupportedEncodingException        */      @Override       public void  Postput (final observercontext<regioncoprocessorenvironment> e,              final put put, final waledit edit, final boolean  Writetowal)               throws unsupportedencodingexception {           INPUTSOLR (put);     }         PUBLIC VOID INPUTSOLR (put put)  {          try {               Solrserver.add (put) Testsolrmain.getinputdoc  &Nbsp;       } catch  (Exception ex)  {              Log.error (Ex.getmessage ());         }     }   } 

    NOTE: Getinputdoc is the essence of this hbase coprocessor coprocessor, which translates the contents of the put in HBase into the values that SOLR needs. where String fieldname = key.substring (Key.indexof (columnfamily)  + 3, key.indexof (" I'm Here "). Trim (); Here is a garbled character, not seen here, please pay attention.

Public static solrinputdocument getinputdoc (put put)  {          Solrinputdocument doc = new solrinputdocument ();          Doc.addfield ("test_id",  bytes.tostring (Put.getrow ());          for  ( Keyvalue c : put.getfamilymap (Bytes.tobytes (columnfamily))  {              string key = bytes.tostring (C.getkey ());              string value = bytes.tostring (C.getvalue ());              if  (Value.isempty ())  {                  continue;             }              String fieldname = key.substring (Key.indexof (columnfamily)  +&NBsp;3,                      Key.indexof (""). Trim ();               Doc.addfield (fieldname, value);         }          return doc;     } 

Second, write test program entry code main

This code builds a table to the HBase request and submits the simulated data to the hbase, continuously inserting the data into the HBase, recording the time and testing the insert performance.

/*   * Copyright: Wang Anqi    * Description: Test hbaseinsert,hbase Insert Performance    * modified: 2014-05-27    * Modifications: New    */  package solrHbase.test;      import hbaseinput.hbaseinsert;     import ***;     public  class TestHBaseMain {         private static  configuration config;      private static string tablename =  " Angelhbase ";      private static htable table = null;       private static final string columnfamily =  "Wanganqi";         /**       *  @param  args        */      Public static void main (String[] args)  {         Config = hbaseconfiguration.create ();          Config.set (" Hbase.zookeeper.quorum ", " 192.103.101.104 ");          hbaseinsert.createtable ( config, tablename, columnfamily);          try {              table = new htable (Config, bytes.tobytes (tablename));               for  (int k = 0; k < 1; k+ +)  {                  Thread t = new thread ()  {                      Public void run ()  {                          for  (int  i = 0; i < 100000; i++)  {                              hbaseinsert.inputdata (table,                                       Putcreater.createputs (1000, columnfamily));                              calendar c = calendar.getinstance ();                               String datetime  = c.get (calendar.year)  +  "-"                                       + c.get (calendar.month)  +  "-"                                        + c.get (calendar.date)  +  "T"                                       + c.get (calendar.hour)  +  ":"                                       + c.get (Calendar.minute)  +  ":"                                       + c.get (Calendar.second)  +  ":"                                       + c.get ( Calendar.millisecond)  +  "z  write: "                                       + i * 1000;                    &NBsp;         SYSTEM.OUT.PRINTLN (dateTime);                         }                     }                  };                  T.start ();             }         }  catch  (ioexception e1)  {              e1.printstacktrace ();          }     }     } 

    Below is a hbase-related operation that encapsulates it in a class with only the relevant code for building tables and inserting data.

/*   * Copyright: Wang Anqi    * Description: HBase related operations, table and insert data    * modified time:2014-05-27    * Modification: New    */  package hbaseInput;  import ***;   import org.apache.hadoop.hbase.client.Put;     public class  hbaseinsert {         Public static void createtable ( configuration config, string tablename,              String  columnfamily)  {          hbaseadmin hbaseadmin;          try {              hbaseadmin = new  Hbaseadmin (config);              if  (hbaseadmin.tableexists (tablename))  {                  return;             }              htabledescriptor tabledescriptor =  new htabledescriptor (tablename);              Tabledescriptor.addfamily (New hcolumndescriptor (columnfamily));              Hbaseadmin.createtable (tabledescriptor);              Hbaseadmin.close ();         } catch  (masternotrunningexception e)  {               e.printstacktrace ();         } catch  (zookeeperconnectionexception e)  {              E.printstacktrace ( );         } catch  (ioexception e)  {              e.printstacktrace ();         }&NBSp;    }         Public static void inputdata (HTable  table, arraylist<put> puts)  {          try {               Table.put (puts);              Table.flushcommits ();              puts.clear ();         } catch  (ioexception e)  {              E.printstacktrace ();         }     }  } 

Iii. compiling simulated data put

Writing data to a hbase requires constructing put, and here's how I'm going to construct the analog data, with the generation of strings, I was provided by the MMSEG dictionary words.dic Random read some words connected to generate a string, the following code does not show, but very easy, you create your own desired data on the OK.

Public static put createput (string columnfamily)  {          String ss = getsentence ();          byte[] family =  bytes.tobytes (columnfamily);          byte[] rowkey =  Bytes.tobytes (""  + math.abs (R.nextlong ()));          put put =  new put (Rowkey);          Put.add (family, bytes.tobytes),                   bytes.tobytes (""  + math.abs (R.nextint ()) );          ******          Put.add (family,  Bytes.tobytes ("company_mmsegsm"),  bytes.tobytes ("ss"));             return put;     } 

Of course, before running this program, you need to configure the column information you need in SOLR, HBase, SOLR installation and configuration, and their basic usage will be described in later articles. In this case, SOLR's column configuration is the same as the one you used to generate the createput, and of course you can use the form of dynamic columns.

Four, directly to SOLR performance test

If you don't want to test the combination of HBase and SOLR, just want to test the performance of SOLR alone, this is much simpler, you can use the code snippet above to test, just a little assembly.

private static void Sendconcurrentupdatesolrserver (final String URL, final int count) throws Solrserverexception,         IOException {solrserver solrserver = new Concurrentupdatesolrserver (URL, 10000, 20);         for (int i = 0; i < count; i++) {Solrserver.add (Getinputdoc (Putcreater.createput (columnfamily))); }     }

Hope can help you strict specifications-Kung Fu home. This time the article code more than a point, but code is the best language to interpret ideas, and my promotion is to minimize the annotation of the code as much as possible, to simplify your code, to make your code clear enough to understand it, and even to be similar to pseudocode, which is advocated in the "refactoring" book.

Original link: http://www.cnblogs.com/wgp13x/p/3927979.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.