HBase Some solutions for establishing a Level two index (solr+hbase scenarios, etc.)

Source: Internet
Author: User
Tags http request solr static class table name zookeeper
The first-level index of HBase is Rowkey, and we can only retrieve it through Rowkey. If we make some combination queries relative to the column columns of hbase, we need to use HBase's two-level indexing scheme for multi-condition queries.


HBase Some solutions for building two-level indexes
//-------------------------------------------------------------------
The common two-level indexing scheme has the following types:
1.MapReduce Solution
2.ITHBASE Solution
3.IHBASE Solution
4.Coprocessor Solution
5.solr+hbase Solutions




MapReduce scheme


Indexbuilder: Using Mr to build index
Pros: You can build index concurrently in batches
Disadvantage: You cannot build index in real time when you insert a piece of data into hbase


Example:


Original table:


Row 1 F1:name Zhangsan
Row 2 F1:name Lisi
Row 3 F1:name Wangwu


There are three student number 123, to see Zhang San this person's number is how much, if the data is very large, the full table scan is not very realistic, then want to how to build an index table, as follows: Create a reverse Index Table 123, the Zhang San as a rowkey, the study number as a column, you can check this record by Rowkey. Some columns can be queried by a reverse index.


Index Table:


Row Zhangsan F1:id 1
Row Lisi F1:id 2
Row Wangwu F1:id 3




Demo: Execute this program $ hbase indexbuilder parameter ()

Package indexdouble;
Import java.io.IOException;
Import Java.util.HashMap;
Import Java.util.Map;


Import Java.util.Set;
Import Org.apache.commons.collections.map.HashedMap;
Import org.apache.hadoop.conf.Configuration;
Import org.apache.hadoop.hbase.HBaseConfiguration;
Import org.apache.hadoop.hbase.client.HConnection;
Import Org.apache.hadoop.hbase.client.HConnectionManager;
Import Org.apache.hadoop.hbase.client.Put;
Import Org.apache.hadoop.hbase.client.Result;
Import Org.apache.hadoop.hbase.client.Scan;
Import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
Import Org.apache.hadoop.hbase.mapreduce.MultiTableOutputFormat;
Import Org.apache.hadoop.hbase.mapreduce.TableInputFormat;
Import Org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
Import Org.apache.hadoop.hbase.mapreduce.TableMapper;
Import org.apache.hadoop.hbase.util.Bytes;
Import Org.apache.hadoop.mapreduce.Job;




Import Org.apache.hadoop.util.GenericOptionsParser; public class Indexbuilder {private String rooTdir;
    Private String Zkserver;
    Private String Port; 
    Private Configuration conf;


    Private hconnection hconn = null;
        Private Indexbuilder (String rootdir,string zkserver,string port) throws ioexception{This.rootdir = RootDir;
        This.zkserver = Zkserver;


        This.port = port;
        conf = Hbaseconfiguration.create ();
        Conf.set ("Hbase.rootdir", RootDir);
        Conf.set ("Hbase.zookeeper.quorum", zkserver);


        Conf.set ("Hbase.zookeeper.property.clientPort", port);  
    Hconn = hconnectionmanager.createconnection (conf); } static class Mymapper extends Tablemapper<immutablebyteswritable, put>{//records the column to be indexed Priva


        Te map<byte[], immutablebyteswritable> indexes = new hashmap<byte[], immutablebyteswritable> ();


        Private String familyname; @Override protected void Map (immutablebyteswritable key, Result value, Context context) throws IOE Xception, Interruptedexception {//original table column set<byte[]> keys = Indexes.keyset (); The Rowkey of the index table is the column of the original table, the column of the index table is the original table's Rowkey for (byte[] k:keys) {//Get the table name of the new index table Im


                Mutablebyteswritable Indextablename = Indexes.get (k); Result holds the data of the original table//finds the contents according to the column family and columns to get the original table value byte[] val = Value.getvalue (Bytes.


                Tobytes (familyname), k);  if (val! = null) {//Index table put put = new put (val);//Index Table row key//column family
                    column of the original table row key Put.add (Bytes.tobytes ("F1"), bytes.tobytes ("id"), Key.get ());
                Context.write (Indextablename, put);
        }}}//Do some processing before actually running the map.
            @Override protected void Setup (context context) throws IOException, Interruptedexception { Configure configuration by Context conf = Context.getConfiguration (); 
            Get table name String tableName = Conf.get ("TableName");
            String family = conf.get ("Familyname");


            Get column Family familyname = Conf.get ("columnfamily"); 


            Get column string[] qualifiers = conf.getstrings ("qualifiers"); for (String qualifier:qualifiers) {//Build a map, create a table for each column, the name of the table tablename+ "-" +qualifier//Original Table's Column index table new table name Indexes.put (bytes.tobytes (qualifier), New immutablebyteswritable (B
            Ytes.tobytes (tablename+ "-" +qualifier)); }}} public static void Main (string[] args) throws IOException, ClassNotFoundException, Interrup
        tedexception {String RootDir = "Hdfs://hadoop1:8020/hbase";
        String zkserver = "HADOOP1";


        String port = "2181";


        Indexbuilder conn = new Indexbuilder (rootdir,zkserver,port); string[] Otherargs = new Genericoptionsparser(conn.conf, args). Getremainingargs ();
        Pass at least three parameters Indexbuilder:tablename,columnfamily,qualifier if (otherargs.length<3) {system.exit (-1);
        }//table name String tableName = otherargs[0];


        Column family String columnfamily = otherargs[1];
        Conn.conf.set ("TableName", tableName);


        Conn.conf.set ("columnfamily", columnfamily);


        Column may exist multiple columns string[] qualifiers = new String[otherargs.length-2];
        for (int i = 0; i < qualifiers.length; i++) {qualifiers[i] = otherargs[i+2];


        }//Set column conn.conf.setStrings ("qualifiers", qualifiers);


        @SuppressWarnings ("deprecation") job Job = new Job (conn.conf,tablename);


        Job.setjarbyclass (Indexbuilder.class);
        Job.setmapperclass (Mymapper.class);
        Job.setnumreducetasks (0);//Because there is no need to perform the reduce phase job.setinputformatclass (Tableinputformat.class); Job.setoutputformatclass (MultitaBleoutputformat.class);
        Scan scan = new scan (); scan.setcaching (1000);//Bulk Read how many records//Initialize Mr Tablemapreduceutil.inittablemapperjob (Tablename,scan, Myma


        Pper.class, Immutablebyteswritable.class, Put.class, Job);


    Job.waitforcompletion (TRUE);


 }
}



Create original Table
HBase (main):002:0> create ' studentinfo ', ' F1 '
0 row (s) in 0.6520 seconds


= Hbase::table-studentinfo



HBase (main):003:0> put ' studentinfo ', ' 1 ', ' f1:name ', ' Zhangsan '
0 row (s) in 0.1640 seconds


HBase (main):004:0> put ' studentinfo ', ' 2 ', ' f1:name ', ' Lisi '
0 row (s) in 0.0240 seconds


HBase (main):005:0> put ' studentinfo ', ' 3 ', ' f1:name ', ' Wangwu '
0 row (s) in 0.0290 seconds


HBase (main):006:0> scan ' Studentinfo '
ROW Column+cell
1 Column=f1:name, timestamp=1436262175823, Value=zhangsan
2 Column=f1:name, timestamp=1436262183922, Value=lisi
3 Column=f1:name, timestamp=1436262189250, Value=wangwu
3 row (s) in 0.0530 seconds


Create an Index table

HBase (main):007:0> create ' studentinfo-name ', ' F1 '
0 row (s) in 0.7740 seconds


= Hbase::table-studentinfo-name


Execution Results
> Scan ' studentinfo-name '




Ithbase Solutions

//------------------------------------------------------------------------------
Pros: Ithbase (Indexed transactional hbase) is a thing-type indexed extension of HBase.
Cons: You need to refactor hbase for several years without updating.
Http://github.com/hbase-trx/hbase-transactional-tableindexed




Ihbase Solutions
//------------------------------------------------------------------------------
* * Pros: **ihbase (Indexed hbase) is an extension of hbase, with dry support for faster scanning.
Cons: Need to refactor hbase, version old.
Principle: When the Memstore is full, the ihbase intercepts the request and builds an index of the Memstore data, which is stored in the table in the same way that the other CF is indexed. Scan, Ihbase will speed up the scan by combining the markers in the index column.
Http://github.com/ykulbak/ihbase




coprocessor Solutions
//------------------------------------------------------------------------------
hindex– hbase Level Two index from Huawei
Http://github.com/Huawei-Hadoop/hindex


The solution is 100% Java, compatible with Apache HBase 0.94.8, and is open sourced under ASL.


Following capabilities is supported currently.
1.multiple Indexes on table,
2.multi column index,
3.index based on part of a column value,
4.equals and Range condition scans using index, and
5.bulk loading data to indexed table (indexing do with bulk load).




Solr+hbase Solutions
//------------------------------------------------------------------------------
SOLR is a stand-alone enterprise Search application server that pairs and provides API interfaces similar to dry web-service. The user can submit an XML file of a certain format to the search engine server via an HTTP request, generate an index, or make a lookup request through an HTTP GET operation and get the returned result in XML format.
SOLR is a high-performance, JAVA5 development, skeleton lucene full-Text Search server. At the same time, it is extended to provide a richer query language than Lucene, at the same time, it can be configured, extensible and optimized for query performance, and provides a perfect function node interface, is a very good full-text search engine.

HBase undoubtedly has its advantages, but its own only for the Rowkey to support the rapid retrieval of milliseconds, for multi-field combination query is powerless.
The principle of hbase multi-conditional query based on SOLR is simple, indexing the fields and Rowkey of the HBase table involving conditional filtering in SOLR, and quickly obtaining rowkey values that meet the filter criteria through SOLR's multi-conditional query. After getting these rowkey, the query is made in HBase by specifying Rowkey.


This article was reproduced from: http://blog.csdn.net/scgaliguodong123_/article/details/46790381



The research on the scheme of multi-dimensional condition real-time query using HBase.
1.MapReduce Solution
Pros: Concurrent Batch Build Index
Cons: Cannot build index in real time
2.ITHBASE Solution
Cons: You need to refactor hbase for several years without updating.
3.IHBASE Solution
Cons: You need to refactor hbase.
4.Coprocessor Solution
Huawei's HBase Two-level index uses this scheme (HINDEX code Open source).
1), index and data are placed in different tables;
2), all the operational logic is placed on the service side;
3), the need to modify HBase source code, the intrusion of large
4), query without specifying, you can automatically use the best index
Cons: The code is complex, with a lot of code. It may be difficult to figure out the principle at once. Hindex and the company's HBase version incompatibility
5.solr+hbase Solutions
Cons: not familiar with SOLR
6.CCIndex
Disadvantage: such as storage overhead is relatively large, especially when the index column is more, the space overhead will be larger, the index update cost is relatively high, will affect the system's throughput; After the index is created, it cannot be dynamically added or modified.
7.360 hbase Two-level index
The 3,602-level index is characterized by the following:
1), index and Rowkey in the same table;
2), support multi-range and operation optimization;
3), support index rebuilding
Disadvantage: No open source, need to follow his ideas to achieve, the principle is not too clear, only understand a little, according to this idea to rebuild can also be very time-consuming.
Level Two index of 8.phoenix
Benefit: Open source, with two levels of index.
Status: The company's hbase cluster, limited resources. Currently mainly available to DMP in use. Just to hold on to the current service.
Occasionally when there is pressure, there will be a few machines hanging off.
According to current demand, there are only two solutions: one is to develop an HBase two index tool in accordance with the above thought, and the other is to use Phoenix's own two-level index.
According to the current use of hbase clusters, even if they developed a two-level index, estimated that the cluster resources are not enough to use the premise, also can not play a two-level index speed advantage.
So we can only optimize our program performance on the existing resources of Phoenix to minimize the retrieval time.






Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.