HBase Multi-Conditional query testing based on SOLR

Source: Internet
Author: User
Tags solr


In a telecommunication project, HBase is used to store the user terminal details, which can be instantly queried by the front page. HBase undoubtedly has its advantages, but its own only for the Rowkey to support the rapid retrieval of milliseconds, for multi-field combination query is powerless. There are several scenarios for multi-conditional queries against HBase, but these scenarios are either too complex or inefficient, and this article only tests and validates the SOLR-based hbase multi-conditional query scheme.


The principle of hbase multi-conditional query based on SOLR is simple, indexing the fields and Rowkey of the HBase table involving conditional filtering in SOLR, and quickly obtaining rowkey values that meet the filter criteria through SOLR's multi-conditional query. After getting these rowkey, the query is made in HBase by specifying Rowkey.

Test environment:

SOLR 4.0.0 version, using its own jetty server container, single node;

An hbase cluster consisting of hbase-0.94.2-cdh4.2.1,10 lunux servers.

25.12 million data in HBase 172 fields;

SOLR indexes 1 million data in HBase;

Test results:

1, 1 million data are indexed in SOLR for 8 fields. Up to 8 filter conditions in SOLR get the Rowkey value of 51,316 data, which is basically 57-80 milliseconds. Get all 51,316 data 12 field values in the HBase table based on the Rowkey value returned by SOLR, which takes roughly 15 seconds;

2, the amount of data ibid, filtering conditions ibid, using solr paging query, 20 data per acquisition, SOLR obtained 20 rowkey value of 4-10 milliseconds, get SOLR incoming Rowkey value in HBase to obtain the corresponding 20 12 fields of data, time consuming 6 milliseconds.

The setup of the test environment and the implementation of the relevant code are listed below. I. Building the SOLR Environment

Since the original intention was only to test SOLR's use, SOLR's operating environment was only using its own jetty, rather than the tomcat of most people, without the SOLR cluster, just a single SOLR server, and no parameter tuning.

1) on the Apache website Download SOLR 4:http://lucene.apache.org/solr/downloads.html, we downloaded here is "apache-solr-4.0.0.tgz";

2) Unzip the SOLR tarball in the current directory:

CD/OPTTAR-XVZF apache-solr-4.0.  0.tgz 

3) Modify the SOLR configuration file Schema.xml, add the multiple fields we need to index (the configuration file is located in "/opt/apache-solr-4.0.0/example/solr/collection1/conf/")

   <field name= "Rowkey" type= "string" indexed= "true" stored= "true" required= "true" multivalued= "false"/> <fie LD name= "Time" type= "string" indexed= "true" stored= "true" required= "false" multivalued= "false"/> <field name= "Te Bid "Type=" string "indexed=" true "stored=" true "required=" false "multivalued=" false "/> <field name=" Tetid "type= "String" indexed= "true" stored= "true" required= "false" multivalued= "false"/> <field name= "Puid" type= "string" in Dexed= "true" stored= "true" required= "false" multivalued= "false"/> <field name= "Mgcvid" type= "string" indexed= " True "stored=" true "required=" false "multivalued=" false "/> <field name=" Mtcvid "type=" string "indexed=" true "sto Red= "true" required= "false" multivalued= "false"/> <field name= "Smaid" type= "string" indexed= "true" stored= "true "Required=" false "multivalued=" false "/> <field name=" Mtlkid "type=" string "indexed=" true "stored=" true "require D= "false" multivalued= "false"/> 

Another key point is to modify the original UniqueKey, this article sets the Rowkey field of the HBase table to the uniquekey of the SOLR index:


The type parameter represents the index data type, and I have set the type all to string in order to avoid the failure of an exception type of data, which should normally be set according to the actual field type, such as Integer field set to int, which is more advantageous for index establishment and retrieval;

The indexed parameter indicates whether this field is indexed or not, and the fields that do not participate in conditional filtering are all set to false, depending on the actual situation.

The stored parameter represents whether the value of this field is stored, it is recommended that only the field that needs to get the value is set to true to avoid wasting storage, such as our scene only needs to get Rowkey, then just set the Rowkey field to true, the other fields are all set flase;

The required parameter indicates whether this field is required, and if a field in the data source might have a null value, this property must be set to false, otherwise SOLR throws an exception;

The multivalued parameter represents whether this field allows multiple values, which are usually set to false, and can be set to true based on actual requirements.

4) We use SOLR's own example as the operating environment, navigate to the example directory, and start the service monitoring:

cd/opt/apache-solr-4.0.  0/Examplejava-jar./start.jar  

If the launch succeeds, you can open this page from your browser:

Ii. reading the data from the HBase source table and indexing in SOLR

One scenario is to get the data indexed by the common API of HBase, which has the disadvantage of being less efficient than processing more than 100 data per second (perhaps through multithreading to improve efficiency):

Package Com.ultrapower.hbase.solrhbase;import Java.io.ioexception;import org.apache.hadoop.conf.Configuration; Import Org.apache.hadoop.hbase.hbaseconfiguration;import Org.apache.hadoop.hbase.keyvalue;import Org.apache.hadoop.hbase.client.htable;import Org.apache.hadoop.hbase.client.result;import Org.apache.hadoop.hbase.client.resultscanner;import Org.apache.hadoop.hbase.client.scan;import Org.apache.hadoop.hbase.util.bytes;import Org.apache.solr.client.solrj.solrserverexception;import Org.apache.solr.client.solrj.impl.httpsolrserver;import Org.apache.solr.common.solrinputdocument;public Class  solrindexer {/** * @param args * @throws ioexception * @throws solrserverexception */public static        void Main (string[] args) throws IOException, Solrserverexception {final Configuration conf; Httpsolrserver solrserver = new Httpsolrserver ("HTTP://"); Because the server is the jetty container that comes with SOLR, the default port number is 8983 conf = hbaseconfigUration.create (); htable table = new htable (conf, "hb_app_xxxxxx");        Specify the HBase table name here Scan scan = new scan (); Scan.addfamily (Bytes.tobytes ("D"));        This specifies the column family of the HBase table scan.setcaching (500);        Scan.setcacheblocks (FALSE);        Resultscanner SS = Table.getscanner (scan);        System.out.println ("Start ...");        int i = 0;                try {for (Result r:ss) {solrinputdocument Solrdoc = new Solrinputdocument ();                Solrdoc.addfield ("Rowkey", New String (R.getrow ()));                    For (KeyValue Kv:r.raw ()) {String fieldName = new String (Kv.getqualifier ());                    String fieldvalue = new String (Kv.getvalue ());                            if (Fieldname.equalsignorecase ("Time") | | | fieldname.equalsignorecase ("tebid") || Fieldname.equalsignorecase ("Tetid") | | Fieldname.equalsignorecase ("Puid") | |Fieldname.equalsignorecase ("Mgcvid") | | Fieldname.equalsignorecase ("Mtcvid") | | Fieldname.equalsignorecase ("Smaid") | |                    Fieldname.equalsignorecase ("Mtlkid")) {Solrdoc.addfield (fieldName, fieldvalue);                }} solrserver.add (Solrdoc);                Solrserver.commit (True, true, true);                i = i + 1;            SYSTEM.OUT.PRINTLN ("successfully processed" + i + "bar data");            } ss.close ();            Table.close ();        SYSTEM.OUT.PRINTLN ("Done!");            } catch (IOException e) {} finally {ss.close ();            Table.close ();        SYSTEM.OUT.PRINTLN ("Erro!"); }    }}

Another option is to use the MapReduce framework of HBase, which is highly efficient in distributed parallel execution and takes only 5 minutes to process 10 million data, but this high concurrency requires configuration tuning of the SOLR server, or it throws an exception that the server cannot respond to:

Error:org.apache.solr.common.SolrException:Server at returned non OK status : 503, Message:service unavailable 

MapReduce Entry procedure:

Package Com.ultrapower.hbase.solrhbase;import Java.io.ioexception;import Java.net.urisyntaxexception;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.hbase.hbaseconfiguration;import Org.apache.hadoop.hbase.client.scan;import Org.apache.hadoop.hbase.mapreduce.tablemapreduceutil;import Org.apache.hadoop.hbase.util.bytes;import Org.apache.hadoop.mapreduce.job;import        Org.apache.hadoop.mapreduce.lib.output.nulloutputformat;public class Solrhbaseindexer {private static void usage () {        SYSTEM.ERR.PRINTLN ("Input parameters: < Profile path > < start line > < end line >");    System.exit (1);    } private static Configuration conf; public static void Main (string[] args) throws IOException, Interruptedexception, ClassNotFoundException, Urisyn        taxexception {if (args.length = = 0 | | args.length > 3) {usage ();        } createhbaseconfiguration (Args[0]);        Configproperties tutorialproperties = new Configproperties (args[0]); StrinG Tbname = Tutorialproperties.gethbtbname ();        String tbfamily = tutorialproperties.gethbfamily ();        Job Job = new Job (conf, "solrhbaseindexer");        Job.setjarbyclass (Solrhbaseindexer.class);        Scan scan = new scan ();            if (args.length = = 3) {Scan.setstartrow (Bytes.tobytes (args[1]));        Scan.setstoprow (Bytes.tobytes (args[2));        } scan.addfamily (Bytes.tobytes (tbfamily)); Scan.setcaching (500);        Set the amount of cache data to increase efficiency scan.setcacheblocks (false); Create Map Task Tablemapreduceutil.inittablemapperjob (tbname, scan, solrhbaseindexermapper.class, NULL, n        ull, Job);        Output Job.setoutputformatclass (Nulloutputformat.class) is not required;        Job.setnumreducetasks (0);    System.exit (Job.waitforcompletion (true)? 0:1); /** * Read and set HBase configuration information from the configuration file * * @param propslocation * @return */private static void Createhba Seconfiguration (String propslocation) {ConfigproperTies tutorialproperties = new Configproperties (propslocation);        conf = Hbaseconfiguration.create ();        Conf.set ("Hbase.zookeeper.quorum", Tutorialproperties.getzkquorum ());        Conf.set ("Hbase.zookeeper.property.clientPort", Tutorialproperties.getzkport ());        Conf.set ("Hbase.master", Tutorialproperties.gethbmaster ());        Conf.set ("Hbase.rootdir", Tutorialproperties.gethbrootdir ());    Conf.set ("Solr.server", Tutorialproperties.getsolrserver ()); }}

The corresponding mapper:

Package Com.ultrapower.hbase.solrhbase;import Java.io.ioexception;import org.apache.hadoop.conf.Configuration; Import Org.apache.hadoop.hbase.keyvalue;import Org.apache.hadoop.hbase.client.result;import Org.apache.hadoop.hbase.io.immutablebyteswritable;import Org.apache.hadoop.hbase.mapreduce.tablemapper;import Org.apache.hadoop.io.text;import Org.apache.solr.client.solrj.solrserverexception;import Org.apache.solr.client.solrj.impl.httpsolrserver;import Org.apache.solr.common.solrinputdocument;public Class Solrhbaseindexermapper extends Tablemapper<text, text> {public void map (immutablebyteswritable key, Result hbase Result, Context context) throws Interruptedexception, IOException {Configuration conf = context.getconf        Iguration ();        Httpsolrserver solrserver = new Httpsolrserver (Conf.get ("Solr.server"));        Solrserver.setdefaultmaxconnectionsperhost (100);        Solrserver.setmaxtotalconnections (1000);        Solrserver.setsotimeout (20000); SoLrserver.setconnectiontimeout (20000);        Solrinputdocument Solrdoc = new Solrinputdocument ();            try {Solrdoc.addfield ("Rowkey", New String (Hbaseresult.getrow ()));                        For (KeyValue rowQualifierAndValue:hbaseResult.list ()) {String fieldName = new String (                Rowqualifierandvalue.getqualifier ());                String fieldvalue = new String (Rowqualifierandvalue.getvalue ());                        if (Fieldname.equalsignorecase ("Time") | | | fieldname.equalsignorecase ("tebid") || Fieldname.equalsignorecase ("Tetid") | | Fieldname.equalsignorecase ("Puid") | | Fieldname.equalsignorecase ("Mgcvid") | | Fieldname.equalsignorecase ("Mtcvid") | | Fieldname.equalsignorecase ("Smaid") | |       Fieldname.equalsignorecase ("Mtlkid")) {Solrdoc.addfield (fieldName, fieldvalue);         }} solrserver.add (Solrdoc);        Solrserver.commit (True, true, true);        } catch (Solrserverexception e) {System.err.println ("Update SOLR Index Exception:" + New String (Hbaseresult.getrow ())); }    }}

To read an auxiliary class for a parameter profile:

Package Com.ultrapower.hbase.solrhbase;import Java.io.file;import Java.io.filereader;import java.io.IOException;    Import Java.util.properties;public class Configproperties {private static Properties props;    Private String Hbase_zookeeper_quorum;    Private String Hbase_zookeeper_property_client_port;    Private String Hbase_master;    Private String Hbase_rootdir;    Private String Dfs_name_dir;    Private String Dfs_data_dir;    Private String Fs_default_name; Private String Solr_server; SOLR server address private String hbase_table_name; The HBASE table name for the SOLR index needs to be established private String hbase_table_family;        The column family of the HBase table public configproperties (String proplocation) {props = new Properties ();            try {File File = new file (proplocation);            SYSTEM.OUT.PRINTLN ("Load configuration file from:" + File.getabsolutepath ());            FileReader is = new FileReader (file);            Props.load (IS);    Hbase_zookeeper_quorum = Props.getproperty ("Hbase_zookeeper_quorum");        Hbase_zookeeper_property_client_port = Props.getproperty ("Hbase_zookeeper_property_client_port");            Hbase_master = Props.getproperty ("Hbase_master");            Hbase_rootdir = Props.getproperty ("Hbase_rootdir");            Dfs_name_dir = Props.getproperty ("Dfs_name_dir");            Dfs_data_dir = Props.getproperty ("Dfs_data_dir");            Fs_default_name = Props.getproperty ("Fs_default_name");            Solr_server = Props.getproperty ("Solr_server");            Hbase_table_name = Props.getproperty ("Hbase_table_name");        hbase_table_family = Props.getproperty ("hbase_table_family");        } catch (IOException e) {throw new RuntimeException ("Error loading config file");        } catch (NullPointerException e) {throw new RuntimeException ("file does not exist");    }} public String Getzkquorum () {return hbase_zookeeper_quorum;    } public String Getzkport () {return hbase_zookeeper_property_client_port; } public String Gethbmaster() {return hbase_master;    } public String Gethbrootdir () {return hbase_rootdir;    } public String Getdfsnamedir () {return dfs_name_dir;    } public String Getdfsdatadir () {return dfs_data_dir;    } public String Getfsdefaultname () {return fs_default_name;    } public String Getsolrserver () {return solr_server;    } public String Gethbtbname () {return hbase_table_name;    } public String gethbfamily () {return hbase_table_family; }}

Parameter configuration file "Config.properties":

Hbase_zookeeper_quorum=slave-1,slave-2,slave-3,slave-4,slave-5hbase_zookeeper_property_client_port=2181hbase_ master=master-1:60000hbase_rootdir=hdfs:///hbasedfs_name_dir=/opt/data/dfs/namedfs_data_dir=/opt/data/d0/dfs2/ Datafs_default_name=hdfs:// _user_tehbase_table_family=d
III. Multi-criteria query for hbase data in conjunction with SOLR:

The SOLR index can be manipulated via web pages,

Inquire: ( time:201307 and Tetid:1 and mgcvid:101 and smaid:101 and puid:102) 

Delete all indexes:<delete><query>*:* </query></ Delete>&stream.contenttype=text/xml;charset=utf-8&commit=true 

Querying hbase data by using a Java client in conjunction with SOLR:

Package Com.ultrapower.hbase.solrhbase;import Java.io.ioexception;import Java.nio.bytebuffer;import Java.util.arraylist;import Java.util.list;import Org.apache.hadoop.conf.configuration;import Org.apache.hadoop.hbase.hbaseconfiguration;import Org.apache.hadoop.hbase.client.get;import Org.apache.hadoop.hbase.client.htable;import Org.apache.hadoop.hbase.client.result;import Org.apache.hadoop.hbase.util.bytes;import Org.apache.solr.client.solrj.solrquery;import Org.apache.solr.client.solrj.solrserver;import Org.apache.solr.client.solrj.solrserverexception;import Org.apache.solr.client.solrj.impl.httpsolrserver;import Org.apache.solr.client.solrj.response.QueryResponse; Import Org.apache.solr.common.solrdocument;import Org.apache.solr.common.solrdocumentlist;public class QueryData {/ * * * @param args * @throws solrserverexception * @throws IOException */public static void main (Strin G[] args) throws Solrserverexception, IOException {final Configuration conf;       conf = Hbaseconfiguration.create ();        htable table = new htable (conf, "Hb_app_m_user_te");        Get get = null;                list<get> list = new arraylist<get> ();        String url = "HTTP://";        Solrserver Server = new Httpsolrserver (URL);        Solrquery query = new Solrquery ("time:201307 and Tetid:1 and mgcvid:101 and smaid:101 and puid:102"); Query.setstart (0); Data start line, paging with query.setrows (10);        Returns the number of records, paging with queryresponse response = server.query (query);        Solrdocumentlist docs = Response.getresults (); System.out.println ("Number of documents:" + docs.getnumfound ());         The total number of data can also be easily obtained System.out.println ("Query time:" + response.getqtime ());            for (Solrdocument doc:docs) {get = new get (Bytes.tobytes (String) doc.getfieldvalue ("Rowkey"));        List.add (get);                } result[] res = table.get (list);        byte[] bt1 = null;        byte[] bt2 = null; byte[] BT3 = nUll        byte[] bt4 = null;        String str1 = null;        String str2 = null;        String STR3 = null;        String STR4 = null;            for (Result rs:res) {bt1 = Rs.getvalue ("D". GetBytes (), "3mpon". GetBytes ());            BT2 = Rs.getvalue ("D". GetBytes (), "3mponid". GetBytes ());            BT3 = Rs.getvalue ("D". GetBytes (), "Amarpu". GetBytes ());            BT4 = Rs.getvalue ("D". GetBytes (), "Amarpuid". GetBytes ());            if (bt1! = null && bt1.length>0) {str1 = new string (BT1);} else {str1 = "no Data";}//Throws an exception when a new string is made to a null value            if (BT2! = null && bt2.length>0) {str2 = new String (BT2);} else {str2 = "no Data";}            if (BT3! = null && bt3.length>0) {str3 = new String (BT3);} else {str3 = "no Data";}            if (BT4! = null && bt4.length>0) {STR4 = new String (BT4);} else {STR4 = "no Data";}            System.out.print (New String (Rs.getrow ()) + "");            System.out.print (str1 + "|"); System.out.prinT (str2 + "|");            System.out.print (STR3 + "|");        System.out.println (STR4 + "|");    } table.close (); }}

It is found that the combination of SOLR index can be used to realize the multi-condition query of hbase, and it can solve two difficulties: paging query and statistic of total data.

Most of the actual scene is paged query, the paging query returns a small amount of data, the use of such a scheme can fully achieve the front-end page millisecond-level real-time response, if there is a large number of data interaction, such as data export, in fact, the efficiency is very high, 100,000 data only 10 seconds.

In addition, if SOLR is used, SOLR and hbase can be continuously optimized, such as the SOLR cluster, or even the Solrcloud Hadoop-based distributed Indexing Service.

In short, hbase can not be more conditional filtering query congenital defects, under the cooperation of SOLR may be better compensated, no wonder such as new egg technology, Gome, suning e-commerce and other Internet companies and many game companies, are using SOLR to support fast query.


This article connects: http://www.cnblogs.com/chenz/articles/3229997.html


Contact: [Email protected]

HBase Multi-Conditional query testing based on SOLR

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.