Recent projects need to use hbase for real-time queries, because HBase supports only one-level indexes, that is, the use of Rowkey as an index query, so there is not enough support for a multi-conditional filter query, and without establishing a two-level index, you can only use the various Filter filter, the sense of query efficiency is not ideal, so consider the establishment of a two-level index scheme.
After the experience of Google's online predecessors, two options are available: Use HBase coprocessor coprocessor to create a Level two index table when writing data, and write the index of each data to a two-level index table, querying the two-level index table based on the filter criteria. Gets the corresponding first-level index Rowkey, and then gets the query results from the data table according to Rowkey. Using SOLR to search the application server, when writing data, establishing a level two index in SOLR (even full-text indexing), the query first gets a collection of primary index Rowkey based on the filter criteria, and then gets the results of the query based on the Rowkey to the data table.
Comparing the two schemes, it is found that the problem in the first scenario is that the total number of data records cannot be obtained directly (to get the total number of records, to maintain a single counter when writing data, and to keep the total number of records updated), the demand for pagination display is not easy to meet , while the SOLR server queries the same way as SQL statements, supports various query condition filtering, limits query scope, number of pages, and directly fetches total records, so I prefer the second scenario.
With regard to the efficiency of the two schemes, it is still in the test that the detailed deployment and code of the second set of scenarios will be written in succession.