Getting started with SOLR

Source: Internet
Author: User
Tags solr solr query
As the search engine function has a large number of functional requirements that require search engines in the portal community to improve user experience, there are currently a centralized solution for implementing search engines:

  • Implement intra-site search by using Lucene's own encapsulation.

Large workload and scalability, not used.

  • Call APIs of Google and Baidu to implement intra-Site Search

It is too dead to bind with a third-party search engine to meet the business expansion needs in the future.

  • Intra-site search based on compass + Lucene

It is suitable for indexing database-driven application data, especially replacing the traditional like '% expression %' to index fields such as varchar and clob, it is a worthwhile solution to implement intra-site search. However, you still need to encapsulate distributed processing and interface encapsulation to a certain extent.

  • Implement intra-site search based on SOLR

This solution provides complete solutions for better encapsulation and scalability. Therefore, this solution is used in the portal community and later added to the compass solution.

1,SOLRIntroduction

SOLR is a Lucene-based Java search engine server. SOLR provides hierarchical search, eye-catching hit display, and multiple output formats (including XML/XSLT and JSON ). It is easy to install and configure, and comes with an HTTP-based management interface. SOLR has been used in many large websites and is relatively mature and stable. SOLR encapsulates and extends Lucene, so SOLR basically follows the related terms of Lucene. More importantly, the index created by SOLR is fully compatible with the Lucene search engine library. By configuring SOLR appropriately, encoding may be required in some cases. SOLR can read and use indexes built into other Lucene applications. In addition, many Lucene tools (such as nutch and Luke) can also use the index created by SOLR.

2,TomcatInstall and configure SOLR

SOLR is developed based on Java, so SOLR can be deployed and used in both Windows and Linux. However, SOLR provides shell scripts for testing, management, and maintenance, therefore, we recommend that you install it on Linux during production and deployment, and use it on windows during testing.

The following describes how to install and configure SOLR in Linux. Windows is similar to this.

Wget http://apache.mirror.phpchina.com/tomcat/tomcat-6/v6.0.16/bin/apache-tomcat-6.0.16.zip

Unzip apache-tomcat-6.0.16.zip

Mv apache-Tomcat-6.0.16/opt/tomcat

Chmod 755/opt/tomcat/bin /*

Wget http://apache.mirror.phpchina.com/lucene/solr/1.2/apache-solr-1.2.0.tgz

Tar zxvf apache-solr-1.2.0.tgz

The most troublesome installation and configuration of SOLR. SOLR. Home is the understanding and configuration of SOLR.

  • Based on the current path

CP apache-solr-1.2.0/Dist/apache-solr-1.2.0.war/opt/tomcat/webapps/SOLR. War

Mkdir/opt/SOLR-Tomcat

CP-r apache-solr-1.2.0/example/SOLR // opt/SOLR-Tomcat/

CD/opt/SOLR-Tomcat

/Opt/tomcat/bin/startup. Sh

In this case (SOLR. SOLR. Home Environment Variable or JNDI is not set), SOLR searches for./SOLR, so you need to switch to/opt/SOLR-Tomcat at startup.

  • Environment VariablesSOLR. SOLR. Home

Add the following environment variables to the current user's environment variables (. bash_profile) or/opt/tomcat/Catalina. sh:

Export java_opts = "$ java_opts-dsolr. SOLR. Home =/opt/SOLR-Tomcat/SOLR"

  • Configuration Based on JNDI

Mkdir-P/opt/tomcat/CONF/Catalina/localhost

Touch/opt/tomcat/CONF/Catalina/localhost/SOLR. XML, the content is as follows:

      <Context docBase="/opt/tomcat/webapps/solr.war" debug="0" crossContext="true" >
                 <Environment name="solr/home" type="java.lang.String" value="/opt/solr-tomcat/solr" override="true" />
    </Context>

Access the SOLR Management Interface

3,SOLRPrinciple

SOLR provides standard HTTP interfaces to add, delete, modify, and query data indexes. In SOLR, you can start indexing and searching by sending an HTTP request to the SOLR web application deployed in the servlet container. SOLR accepts the request, determines the appropriate solrrequesthandler to be used, and then processes the request. Return the response in the same way as HTTP. The standard XML response of SOLR is returned by default. You can also configure the backup response format of SOLR.

You can send four different INDEX requests to the SOLR index servlet:

  • Add/update allows you to add or update documents to SOLR. These additions and updates cannot be found until they are submitted.
  • Commit tells SOLR that all changes made since the last submission can be searched.
  • Optimize restructured Lucene files to improve search performance. It is usually better to perform optimization after the index is complete. If updates are frequent, you should optimize them when the usage is low. An index can run normally without optimization. Optimization is a time-consuming process.
  • Delete can be specified by ID or query. Deleting by ID will delete documents with the specified ID. Deleting by query will delete all documents returned by the query.

A typical add Request Message

<Add>

<Doc>

<Field name = "ID"> TWINX2048-3200PRO </field>

<Field name = "name"> Corsair XMS 2 GB (2x1 GB) 184-pin ddr sdram unbuffered DDR 400 (PC 3200) dual Channel KIT system memory-Retail </field>

<Field name = "Manu"> Corsair microsystems Inc. </field>

<Field name = "cat"> electronics </field>

<Field name = "cat"> memory </field>

<Field name = "Features"> CAS latency 2, 2-3-3-6 Timing, 2.75 V, unbuffered, heat-spreader </field>

<Field name = "price"> 185 </field>

<Field name = "popularity"> 5 </field>

<Field name = "instock"> true </field>

</DOC>

<Doc>

<Field name = "ID"> vs1gb400c3 </field>

<Field name = "name"> Corsair valueselect 1 GB 184-pin ddr sdram unbuffered DDR 400 (PC 3200) system memory-Retail </field>

<Field name = "Manu"> Corsair microsystems Inc. </field>

<Field name = "cat"> electronics </field>

<Field name = "cat"> memory </field>

<Field name = "price"> 74.99 </field>

<Field name = "popularity"> 7 </field>

<Field name = "instock"> true </field>

</DOC>

</Add>

A typical search result message:

<Response>

<Lst name = "responseheader">

<Int name = "status"> 0 </int>

<Int name = "qtime"> 6 </int>

<Lst name = "Params">

<STR name = "rows"> 10 </STR>

<STR name = "start"> 0 </STR>

<STR name = "FL"> *, score </STR>

<STR name = "Hl"> true </STR>

<STR name = "Q"> content: "Faceted browsing" </STR>

</Lst>

</Lst>

<Result name = "response" numfound = "1" Start = "0" maxscore = "1.058217">

<Doc>

<Float name = "score"> 1.058217 </float>

<Arr name = "all">

<STR> http: // localhost/myblog/solr-rocks-again.html </STR>

<STR> SOLR is great </STR>

<STR> SOLR, Lucene, enterprise, search, greatness </STR>

<STR> SOLR has some really great features, like faceted browsing

And replication </STR>

</ARR>

<Arr name = "content">

<STR> SOLR has some really great features, like faceted browsing

And replication </STR>

</ARR>

<Date name = "creationdate"> 2007-01-07t05: 04: 00.000z </date>

<Arr name = "keywords">

<STR> SOLR, Lucene, enterprise, search, greatness </STR>

</ARR>

<Int name = "rating"> 8 </int>

<STR name = "title"> SOLR is great </STR>

<STR name = "url"> http: // localhost/myblog/solr-rocks-again.html </STR>

</DOC>

</Result>

<Lst name = "Highlighting">

<Lst name = "http: // localhost/myblog/solr-rocks-again.html">

<Arr name = "content">

<STR> SOLR has some really great features, like <em> faceted </em>

<Em> browsing </em> and replication </STR>

</ARR>

</Lst>

</Lst>

</Response>

For more information about SOLR, see

Http://wiki.apache.org/solr/FrontPage

4, SOLR Test use

The SOLR installation package contains the relevant test sample path in apache-solr-1.2.0/example/exampledocs

  • Test SOLR using shell script (curl:

CD apache-solr-1.2.0/example/exampledocs

VI post. sh: Modify the URL variable value url = http: // localhost: 8080/SOLR/update based on Tomcat's IP address and port

./post.sh *.xml                 # 
  • Test SOLR using SOLR's Java package:

View help: Java-jar post. jar-help

Submit test data:

Java-durl = http: // localhost: 8080/SOLR/update-dData = files-jar post. jar *. xml

 

The following uses liangchuan and URL as examples to describe how to use index commands in solr.

1) modify the SOLR schema and configure the description of the index fields:

VI/opt/SOLR-Tomcat/SOLR/CONF/Schema. XML, add the following content in <fields>:

<Field name = "liangchuan" type = "string" indexed = "true" stored = "true"/>

<Field name = "url" type = "string" indexed = "true" stored = "true"/>

2) create an XML test file for adding an index request

Touch/root/apache-solr-1.2.0/example/exampledocs/liangchuan. XML, the content is as follows:

<Add>

<Doc>

<Field name = "ID"> liangchuan000 </field>

<Field name = "name"> SOLR, the Enterprise Search server </field>

<Field name = "Manu"> Apache Software Foundation </field>

<Field name = "liangchuan"> liangchuan's SOLR "Hello, world" test </field>

<Field name = "url"> http://www.google.com </field>

</DOC>

</Add>

3) Submit an index request

CD apache-solr-1.2.0/example/exampledocs

    ./post.sh liangchuan.xml
    

4) Query

Query through SOLR administrator interface http: // localhost: 8080/SOLR/admin

Or pass the curl test:

       export URL="http://localhost:8080/solr/select/"
       curl "$URL?indent=on&q=liangchuan&fl=*,score"
 
5. SOLR query condition parameter description
 
Parameters Description Example
Q Queries used for search in SOLR. You can append a semicolon and the name of the indexed field without breaking words to include the sorting information. The default sorting is score DESC, which means to sort scores in descending order.

Q = myfield: Java and otherfield: developerworks; Date ASC
This query searches for two specified fields and sorts the results based on a date field.

Start Specify the initial offset to the result set. It can be used to paging the results. The default value is 0.

Start = 15
Returns the results starting with 15th results.

Rows The maximum number of returned documents. The default value is 10. Rows = 25
FQ

Provides an optional filter query. The query result is restricted to only search for the results returned by the filter query. SOLR caches filtered queries. They are very useful for improving the speed of complex queries.

Any valid query that can be passed using the Q Parameter, except for sorting information.
Hl When HL = true, the segments are highlighted in the query response. The default value is false. Refer to the SOLR wiki section on the highlighted parameters to view more options. Hl = true
FL The list separated by commas (,) specifies the field set to be returned in the document results. The default value is "*", indicating all fields. "Score" indicates that scores should also be returned.

*, Score

For more information about SOLR query parameters, see:
   http://wiki.apache.org/solr/CommonQueryParameters

The format of SOLR's query condition parameter q is the same as that of Lucene. For details, see:

Http://lucene.apache.org/java/docs/queryparsersyntax.html

6, SOLR usage mode in the portal community

To use SOLR in the portal community, use the following mode:

  • If the existing data of the original system or the data volume to be indexed is large

Using the HTTP Method to call the SOLR interface method, the efficiency is poor, using SOLR itself to CSV support (http://wiki.apache.org/solr/UpdateCSV)

), Export the data to the CSV format, and then call the solr csv interface http: // localhost: 8080/SOLR/update/CSV
  • Add data to the System

First, assemble the data to be indexed and queried into XML format, and then use httpclient to submit the data to the HTTP interface of SOLR, for example

Http: // localhost: 8080/SOLR/update

You can also refer to the implementation of simpleposttool in post. jar.
http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/util/SimplePostTool.java?view=co
  • Chinese Word Segmentation

Use Ding jieniu as the default Chinese Word Segmentation solution for SOLR (Lucene)

Project Library: http://code.google.com/p/paoding/

Google groups http://groups.google.com/group/paoding

Groups of javaeye: http://analysis.group.javaeye.com/

  • Integration with nutch

Http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html

  • Embedded SOLR

Http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer

  • Distributed Index

Http://wiki.apache.org/solr/CollectionDistribution

7. References

Http://wiki.apache.org/solr/

Http://www.ibm.com/developerworks/cn/java/j-solr1/

Http://www.ibm.com/developerworks/cn/java/j-solr2/

Http://www.xml.com/pub/a/2006/08/09/solr-indexing-xml-with-lucene-andrest.html? Page = 1

Http://lucene.apache.org/java/docs/queryparsersyntax.html

Http://www.blogjava.net/RongHao/archive/2007/11/06/158621.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.