SOLR is a Lucene-based retrieval server. It can quickly build a search service and provide many practical components. For example, highlight, spellcheck, and morelikethis ). I will share with you some of the practices I have gained in my work. (My current SOLR version is 3.4 and tomcat 7.0.21 is used)
(If you are using a Tomcat server and the query request contains Chinese characters, you also need to modify tomcat_home/CONF/server. <connector... uriencoding = "UTF-8"/> Use UTF-8 encoding, see uri_charset_config and HTTP)
Highlight)
We often use a search engine. For example, if you search for Java on Google, the following results are displayed. The result matches the keyword in red, which is different from other content.
By default, the SOLR has configured the highlight component (for details, see solr_home/CONF/sorlconfig. XML ). Usually I only need to request http: // localhost: 8080/SOLR/select? Q = Name: Wang Machang & START = 0 & rows = 10 & HL = true & Hl. FL = Name, we can see two more parameters "Hl = true" and "Hl. FL = Name ". "Hl = true" enables highlight, "Hl. FL = Name "tells SOLR to highlight the name field (if you want to highlight multiple fields, you can add other fields separated by commas, such as" Hl. FL = Name, name2, name3 ").
The query result is as follows: XMLCode
- <? XML version = "1.0" encoding = "UTF-8"?>
-
- <Response>
-
- <Lst name = "responseheader">
-
- <Int name = "status"> 0 </int>
-
- <Int name = "qtime"> 15 </int>
-
- <Lst name = "Params">
-
- <STR name = "Hl"> true </STR>
-
- <STR name = "Hl. FL"> name </STR>
-
- <STR name = "Q"> name: Wang Machang </STR>
- <STR name = "start"> 0 </STR>
-
- <STR name = "rows"> 10 </STR>
-
- </Lst>
-
- </Lst>
-
- <! -- Here is the general returned result -->
-
- <Result name = "response" numfound = "1" Start = "0">
-
- <Doc>
-
- <STR name = "ID"> 4 </STR>
-
- <STR name = "name"> Wang Machang's diligence and simplicity </STR>
- </DOC>
-
- </Result>
-
- <! -- The highlighted returned result -->
-
- <Lst name = "Highlighting">
-
- <! -- Id = 4 -->
-
- <Lst name = "4">
-
- <! -- Highlight the field name -->
-
- <Arr name = "name">
-
- <! -- The following is an XML escape. In fact, the content is "<em> Wang Mazi </em> hardworking and simple" -->
- <STR> & lt; em & gt; Wang Mazi & lt;/em & gt; hardworking and simple </STR>
-
- </ARR>
-
- </Lst>
-
- </Lst>
-
- </Response>
The highlighted content matches the key. By default, it is surrounded by "<em>" and "</em>. If you want to customize the front and back labels of the highlighted area, you can add two parameters in the request: "Hl. simple. pre "and" Hl. simple. post "to specify the front and back labels respectively, such as http: // localhost: 8080/SOLR/select? Q = Name: Wang Machang & START = 0 & rows = 10 & HL = true & Hl. FL = Name & Hl. simple. pre = <B> & Hl. simple. post = </B>. Or modify the highligh searchcomponent in the solrconfig. xml configuration file.
(For more Request Parameters of highlight, refer to highlightingparameters)
Spellcheck)
When solrconfig. XML is configured on the home page, the file may already have these two elements (if not added). You need to make appropriate modifications based on your system environment.
XML Code
-
- <Searchcomponent name = "spellcheck" class = "SOLR. spellcheckcomponent">
-
- <Lst name = "spellchecker">
-
- <STR name = "name"> default </STR>
-
- <! -- The spelling check is performed based on the index of the field. Configure the field named name -->
-
- <STR name = "field"> name </STR>
-
- <! -- Directory of the spelling check index -->
-
- <STR name = "spellcheckindexdir"> spellchecker </STR>
- <! -- When commit is used, the spelling check index is constructed. (Spelling check is effective only after construction) -->
-
- <! -- Of course, you can also choose to build in optimize. Replace "buildoncommint" with "buildonoptimize" -->
-
- <STR name = "buildoncommit"> true </STR>
-
- </Lst>
-
- </Searchcomponent>
-
-
- <Requesthandler name = "/spell" class = "SOLR. searchhandler" Startup = "lazy">
-
- <! -- Default parameter -->
-
- <Lst name = "defaults">
- <STR name = "spellcheck. onlymorepopular"> false </STR>
-
- <STR name = "spellcheck. extendedresults"> false </STR>
-
- <! -- Configure the number of spelling check results (you can increase the number as needed) -->
-
- <STR name = "spellcheck. Count"> 1 </STR>
-
- </Lst>
-
- <Arr name = "last-components">
-
- <STR> spellcheck </STR>
-
- </ARR>
-
- </Requesthandler>
After the configuration, you must re-create the index to make it invalid. Then we request http: // localhost: 8080/SOLR/spell? Q = Name: Wang Ma word & spellcheck = true
If the query is as follows: XML Code
-
- <? XML version = "1.0" encoding = "UTF-8"?>
-
- <Response>
-
- <Lst name = "responseheader">
-
- <Int name = "status"> 0 </int>
-
- <Int name = "qtime"> 0 </int>
-
- </Lst>
-
- <Result name = "response" numfound = "0" Start = "0"/>
- <Lst name = "spellcheck">
-
- <Lst name = "Suggestions">
-
- <Lst name = "Wang Ma Zi">
-
- <Int name = "numfound"> 1 </int>
-
- <Int name = "startoffset"> 0 </int>
-
- <Int name = "endoffset"> 3 </int>
-
- <Arr name = "suggestion">
- <STR> Wang Machang </STR>
-
- </ARR>
-
- </Lst>
-
- </Lst>
-
- </Lst>
-
- </Response>
Sometimes we need to check spelling based on multiple fields, but the above configuration can only set one field. To achieve the same effect, I can only do it separately. Coptyfield technology is required. For example, we define XML code in schema. xml.
- <Field name = "A".../>
- <Field name = "B".../>
To check the spelling of fields A and B at the same time, we may add a fieldxml code.
- <Field name = "AB" multivalued = "true".../>
Then add two more copyfield XML codes.
- <Copyfield source = "A" DEST = "AB"/>
- <Copyfield source = "B" DEST = "AB"/>
The complete configuration is as follows: XML Code
- <Field name = "A".../>
- <Field name = "B".../>
- <Field name = "AB" multivalued = "true".../>
- <Copyfield source = "A" DEST = "AB"/>
- <Copyfield source = "B" DEST = "AB"/>
(For more details, refer to spellcheckcomponent)
Match similarity (morelikethis)
It is used to search for similar documents.
First, configure the morelikethishandler XML code in solrconfig. xml.
- <Requesthandler name = "/MLT" class = "SOLR. morelikethishandler">
- </Requesthandler>
Then I can request http: // localhost: 8080/SOLR/MLT? Q = ID: 7 & MlT. True & MlT. FL = Name & MlT. mintf = 1 & MlT. mindf = 1
Search for the document whose ID is 7 in the request above, and then return other documents similar to this document in the Name field. It should be noted that the field termvector in MlT. FL is true to have the XML code.
- <Field name = "name" termvector = "true".../>
Of course, MLT. Fl can also be added with multiple fields separated by commas.
(For details, refer to morelikethis morelikethishandler)