(a) HIVE+SOLR profile
As the offline data warehouse of the Hadoop ecosystem, hive can easily use SQL to analyze the huge amount of historical data offline, and according to the analysis results, to do some other things, such as report statistics query.
SOLR, as a high-performance search server, provides fast, powerful, full-text retrieval capabilities.
(b) Why is hive integration SOLR required?
Sometimes, we need to store the results of the analysis of hive into the full-text search service in SOLR, for example, we have a business, the search log on our e-commerce website using hive analysis after the storage to SOLR inside to do report query, because it involves search keywords, This field is required to be able to participle query and non-segmentation query, through the Word segmentation query can see the related product changes in a certain period of time a trend chart. Sometimes, we need to load the data in SOLR into hive and use SQL to do some of the join analysis functions, which are complementary to each other to better suit our business needs. Of course there are some open-source projects on the web that have hive integration SOLR, but because of the older version, it is not possible to run in the new version, and after the transformation, it can be run in the latest version.
(c) How can I enable hive integration with SOLR?
The so-called integration is actually some of the components that rewrite the Mr Programming interface of Hadoop. We all know that Mr's programming interfaces are very flexible and highly abstracted, and Mr is not just able to load data sources from HDFs, but can also load data from any non-HDFS system, if we need to customize:
InputFormat
OutputFormat
Recordreader
Recordwriter
Inputsplit
Component, though a bit of a hassle, but loading data from anywhere can really do it, including Mysql,sqlserver,oracle,mongodb, Solr,es,redis and so on.
This is the Mr Programming interface for custom Hadoop, in addition to some of the components above, additional definitions of Serde components and assembly Storagehandler are required, and serde in hive refers to serializer and Deserializer, which is what we call serialization and deserialization, hive needs to use Serde and fileinput to read and write a row of rows of data in a hive table.
Read the process:
HDFS files/every Source--Inputfileformat------Deserializer Row Object
The process of writing:
Row Object---serializer---Outputfileformat, HDFS files/every source
(iv) What can I do after hive integration Solr?
(1) Read SOLR data to Hive's supported SQL syntax, able to perform various aggregations, statistics, analyses, joins, etc.
(2) Generate SOLR Index, one sentence of SQL, can be an Mr Way to index large-scale data
(v) How to install the deployment and use?
Source here, not pasted, has uploaded GitHub, there is a need for friends can use Git clonehttps://github.com/qindongliang/hive-solr after modifying a little pom file after the execution
MVN Clean Package
command to build the jar package and copy the jar package to the Hive's Lib directory
Examples are as follows:
(1) Hive reads SOLR data
Build table:
SQL code 650) this.width=650; "class=" star "src=" Http://qindongliang.iteye.com/images/icon_star.png "alt=" Favorite Code "style=" border:0px; "/>
--The existence of the table is deleted
Drop table if exists SOLR;
--Create an external table
Create external table SOLR (
--Define fields, where the fields need to be consistent with SOLR's fields
Rowkey String,
Sname string
)
--Define the storehandler of the storage
Stored by "Com.easy.hive.store.SolrStorageHandler"
--Configure SOLR Properties
Tblproperties (' solr.url ' = ' http://192.168.1.28:8983/solr/a',
' solr.query ' = ' *:* ',
' solr.cursor.batch.size '=' 10000 ',
' solr.primary_key' =' Rowkey '
);
Execute the bin/hive command for hive command-line terminal:
--Query all data
Select * from SOLR limit 5;
--Query the specified field
Select rowkey from SOLR;
--to aggregate statistical SOLR data in the form of Mr
-
select sname ,count (*) as c from Solr by sname order desc
(2) Example of building an index to SOLR using hive
First build the Data source table:
SQL code 650) this.width=650; "class=" star "src=" Http://qindongliang.iteye.com/images/icon_star.png "alt=" Favorite Code "style=" border:0px; "/>
--Delete if it exists
Drop table if exists index_source;
--Build a data table
CREATE TABLE index_source (ID string, yname string,sname string) ROW FORMAT delimited fields TERMINATED by ' , ' STORED as Textfile;
--Import local data into the data source
load Data local inpath '/root/server/hive/test_solr ' into table index_ Source
Next, build an association table for SOLR:
--delete a table that already exists
Drop table if exists INDEX_SOLR;
--Create an associated SOLR table
Create external table INDEX_SOLR (
ID string,
Yname String,
Sname string
)
--Define the storage engine
Stored by "Com.easy.hive.store.SolrStorageHandler"
--Set SOLR service Properties
Tblproperties (' solr.url ' = ' http://192.168.1.28:8983/solr/b',
' solr.query ' = ' *:* ',
' solr.cursor.batch.size '=' 10000 ',
' solr.primary_key '=' id '
);
Finally, execute the following SQL command to build the SOLR index to the data in the data source:
SQL code 650) this.width=650; "class=" star "src=" Http://qindongliang.iteye.com/images/icon_star.png "alt=" Favorite Code "style=" border:0px; "/>
--Register HIVE-SOLR jar package, otherwise it will not start normally when Mr Mode is run
add Jar/root/server/hive/lib/hive-solr.jar;
--Execute Insert command
INSERT OVERWRITE TABLE index_solr SELECT * from Index_source;
-After successful execution, it can be viewed in SOLR's terminal interface or in the following SOLR query in Hive
Select * from INDEX_SOLR limit 10;
(vi) Can they also integrate with other frameworks?
Of course, as an open source stand-alone framework, we can do a variety of combinations, hive can be integrated with Elasticsearch, can also be integrated with MongoDB, SOLR can be integrated with spark, can also be integrated with pig, but all need us to customize the relevant components to the line, The idea is roughly the same as that of the project.
(vii) Basic environment for this test
Apache Hadoop2.7.1
Apache Hive1.2.1
Apache Solr5.1.0
(eight) Thanks and reference information:
Https://github.com/mongodb/mongo-hadoop/tree/master/hive/src/main/java/com/mongodb/hadoop/hive
Https://github.com/lucidworks/hive-solr
Https://github.com/chimpler/hive-solr
Https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-HowtoWriteYourOwnSerDe
This article is from the "7936494" blog, please be sure to keep this source http://7946494.blog.51cto.com/7936494/1752156
How do I use hive to integrate SOLR?