Prerequisite: SOLR, Solrcloud provides a complete set of data retrieval scheme, HBase provides a perfect large data storage mechanism.
Requirements: 1. For the structured data added to the hbase, it can be retrieved.
2, the data volume is big, achieves 1 billion, 10 billion data quantity.
3, the retrieval of real-time requirements of higher, second-level update.
Description: The following is a system architecture that is built together using SOLR and HBase.
1.1 One-time index creation
L, delete full index
High efficiency, you can close SOLR, delete the data file directly.
2. Re-create full index
Pull the full data in the HBase and create the index in batches.
1.2 Incremental Index Creation
1. Triggers send data to SOLR to build the index.
Configuring and using the HBase trigger feature, the configuration is implemented as follows:
Alter ' angelhbase ', Method => ' Table_att ', ' coprocessor ' => '/home/hbase/hbase-0.94.18-security/lib/ Solrhbase.jar|solrhbase.test.sorlindexcoprocessorobserver|1073741823| '
Alter ' angelhbase ', Method => ' Table_att_unset ', NAME => ' coprocessor$1 '
Then write Sorlindexcoprocessorobserver extendsbaseregionobserver and rewrite the Postput method. In the Postput method, the data structure and data written to hbase need to be read correctly, and then transformed into corresponding solrinputdocument. Use the Concurrentupdatesolrserver method to send solrinputdocument data to the SOLR server, as described in the previous article about SOLR's usage and performance comparisons.
Note: You need to put the SOLR-related jar bundle under Lib and remove the inconsistent version of the jar (there are many). Restart the hbase to take effect when the jar is updated.
Specific performance such as the previous blog introduction to SOLR's usage, performance comparisons are shown. Http://www.cnblogs.com/wgp13x/p/3742653.html http://www.cnblogs.com/wgp13x/p/3748764.html
More Wonderful content: http://www.bianceng.cnhttp://www.bianceng.cn/database/extra/
2, the trigger sends data to the RABBITMQ,SOLR end to obtain the data from the RABBITMQ to build the index.
The embedded method is not recommended for official use. The use of Concurrentupdatesolrserver performance is no different from the previous approach.
3. Suggestions:
Only 1 columns are stored in hbase, and the stored value is PB or JSON string. (Classes and annotation that are converted from bean to solrinputdocument, and their respective compression algorithms)
Alternatively, the data inserted into the hbase is stored in bytes.tobytes (String) type, such as the Long value 2 stored as bytes.tobytes ("" +2). Otherwise, you need to know the specific type of each column in Postput () to generate the correct solrinputdocument, because string data is required in solrinputdocument.
Specific Postput method code, if necessary can leave a message or directly contact with me. http://www.cnblogs.com/wgp13x/
Architecture design of 1.3 hbase and SOLR system
Use HBase to build a structure data storage cloud, used to store huge amounts of data, use Solrcloud cluster to build a search engine, find the ID of the structured data that will be looked up, and only configure it to store IDs.
1, the specific process:
WD writes data on behalf of the user, starting with the user submitting the Write data request Wd1, experiencing wd2, writing to the MySQL database, or writing to the structure data storage cloud, WD3, and submitting it to the SOLR cluster to create an index based on business requirements.
Rd on behalf of the user read data reading, from the user submitted read data request Rd1 start, experience Rd2, directly read MySQL data, or request a search service to the SOLR cluster, Rd3, to the SOLR cluster request for the search results ID, and then to the structure of the data storage cloud through the ID data , and finally returns the result to the user.