Getting started with SOLR

Last Update:2018-12-03 Source: Internet

Author: User

Tags solr solr query

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As the search engine function has a large number of functional requirements that require search engines in the portal community to improve user experience, there are currently a centralized solution for implementing search engines:

Implement intra-site search by using Lucene's own encapsulation.

Large workload and scalability, not used.

Call APIs of Google and Baidu to implement intra-Site Search

It is too dead to bind with a third-party search engine to meet the business expansion needs in the future.

Intra-site search based on compass + Lucene

It is suitable for indexing database-driven application data, especially replacing the traditional like '% expression %' to index fields such as varchar and clob, it is a worthwhile solution to implement intra-site search. However, you still need to encapsulate distributed processing and interface encapsulation to a certain extent.

Implement intra-site search based on SOLR

This solution provides complete solutions for better encapsulation and scalability. Therefore, this solution is used in the portal community and later added to the compass solution.
1,SOLRIntroduction
SOLR is a Lucene-based Java search engine server. SOLR provides hierarchical search, eye-catching hit display, and multiple output formats (including XML/XSLT and JSON ). It is easy to install and configure, and comes with an HTTP-based management interface. SOLR has been used in many large websites and is relatively mature and stable. SOLR encapsulates and extends Lucene, so SOLR basically follows the related terms of Lucene. More importantly, the index created by SOLR is fully compatible with the Lucene search engine library. By configuring SOLR appropriately, encoding may be required in some cases. SOLR can read and use indexes built into other Lucene applications. In addition, many Lucene tools (such as nutch and Luke) can also use the index created by SOLR.
2,TomcatInstall and configure SOLR
SOLR is developed based on Java, so SOLR can be deployed and used in both Windows and Linux. However, SOLR provides shell scripts for testing, management, and maintenance, therefore, we recommend that you install it on Linux during production and deployment, and use it on windows during testing.
The following describes how to install and configure SOLR in Linux. Windows is similar to this.
Wget http://apache.mirror.phpchina.com/tomcat/tomcat-6/v6.0.16/bin/apache-tomcat-6.0.16.zip
Unzip apache-tomcat-6.0.16.zip
Mv apache-Tomcat-6.0.16/opt/tomcat
Chmod 755/opt/tomcat/bin /*
Wget http://apache.mirror.phpchina.com/lucene/solr/1.2/apache-solr-1.2.0.tgz
Tar zxvf apache-solr-1.2.0.tgz
The most troublesome installation and configuration of SOLR. SOLR. Home is the understanding and configuration of SOLR.

Based on the current path

CP apache-solr-1.2.0/Dist/apache-solr-1.2.0.war/opt/tomcat/webapps/SOLR. War
Mkdir/opt/SOLR-Tomcat
CP-r apache-solr-1.2.0/example/SOLR // opt/SOLR-Tomcat/
CD/opt/SOLR-Tomcat
/Opt/tomcat/bin/startup. Sh
In this case (SOLR. SOLR. Home Environment Variable or JNDI is not set), SOLR searches for./SOLR, so you need to switch to/opt/SOLR-Tomcat at startup.

Environment VariablesSOLR. SOLR. Home

Add the following environment variables to the current user's environment variables (. bash_profile) or/opt/tomcat/Catalina. sh:
Export java_opts = "$ java_opts-dsolr. SOLR. Home =/opt/SOLR-Tomcat/SOLR"

Configuration Based on JNDI

Mkdir-P/opt/tomcat/CONF/Catalina/localhost
Touch/opt/tomcat/CONF/Catalina/localhost/SOLR. XML, the content is as follows:
      <Context docBase="/opt/tomcat/webapps/solr.war" debug="0" crossContext="true" >
                 <Environment name="solr/home" type="java.lang.String" value="/opt/solr-tomcat/solr" override="true" />
    </Context>
Access the SOLR Management Interface

3,SOLRPrinciple

SOLR provides standard HTTP interfaces to add, delete, modify, and query data indexes. In SOLR, you can start indexing and searching by sending an HTTP request to the SOLR web application deployed in the servlet container. SOLR accepts the request, determines the appropriate solrrequesthandler to be used, and then processes the request. Return the response in the same way as HTTP. The standard XML response of SOLR is returned by default. You can also configure the backup response format of SOLR.

You can send four different INDEX requests to the SOLR index servlet:

Add/update allows you to add or update documents to SOLR. These additions and updates cannot be found until they are submitted.
Commit tells SOLR that all changes made since the last submission can be searched.
Optimize restructured Lucene files to improve search performance. It is usually better to perform optimization after the index is complete. If updates are frequent, you should optimize them when the usage is low. An index can run normally without optimization. Optimization is a time-consuming process.
Delete can be specified by ID or query. Deleting by ID will delete documents with the specified ID. Deleting by query will delete all documents returned by the query.

A typical add Request Message

<Add>

<Doc>

<Field name = "ID"> TWINX2048-3200PRO </field>

<Field name = "name"> Corsair XMS 2 GB (2x1 GB) 184-pin ddr sdram unbuffered DDR 400 (PC 3200) dual Channel KIT system memory-Retail </field>

<Field name = "Manu"> Corsair microsystems Inc. </field>

<Field name = "cat"> electronics </field>

<Field name = "cat"> memory </field>

<Field name = "Features"> CAS latency 2, 2-3-3-6 Timing, 2.75 V, unbuffered, heat-spreader </field>

</DOC>

<Doc>

<Field name = "name"> Corsair valueselect 1 GB 184-pin ddr sdram unbuffered DDR 400 (PC 3200) system memory-Retail </field>

<Field name = "Manu"> Corsair microsystems Inc. </field>

<Field name = "cat"> electronics </field>

<Field name = "cat"> memory </field>

</DOC>

</Add>

A typical search result message:

<STR name = "FL"> *, score </STR>

<STR name = "Q"> content: "Faceted browsing" </STR>

</Lst>

<Doc>

<STR> http: // localhost/myblog/solr-rocks-again.html </STR>

<STR> SOLR is great </STR>

<STR> SOLR, Lucene, enterprise, search, greatness </STR>

<STR> SOLR has some really great features, like faceted browsing

And replication </STR>

</ARR>

<STR> SOLR has some really great features, like faceted browsing

And replication </STR>

</ARR>

<STR> SOLR, Lucene, enterprise, search, greatness </STR>

</ARR>

<STR name = "title"> SOLR is great </STR>

<STR name = "url"> http: // localhost/myblog/solr-rocks-again.html </STR>

</DOC>

</Result>

<STR> SOLR has some really great features, like <em> faceted </em>

<Em> browsing </em> and replication </STR>

</ARR>

</Lst>

</Response>

For more information about SOLR, see

Http://wiki.apache.org/solr/FrontPage

4, SOLR Test use

The SOLR installation package contains the relevant test sample path in apache-solr-1.2.0/example/exampledocs

Test SOLR using shell script (curl:

CD apache-solr-1.2.0/example/exampledocs

VI post. sh: Modify the URL variable value url = http: // localhost: 8080/SOLR/update based on Tomcat's IP address and port

./post.sh *.xml                 #

Test SOLR using SOLR's Java package:

View help: Java-jar post. jar-help

Submit test data:

Java-durl = http: // localhost: 8080/SOLR/update-dData = files-jar post. jar *. xml

The following uses liangchuan and URL as examples to describe how to use index commands in solr.

1) modify the SOLR schema and configure the description of the index fields:

VI/opt/SOLR-Tomcat/SOLR/CONF/Schema. XML, add the following content in <fields>:

2) create an XML test file for adding an index request

Touch/root/apache-solr-1.2.0/example/exampledocs/liangchuan. XML, the content is as follows:

<Add>

<Doc>

<Field name = "ID"> liangchuan000 </field>

<Field name = "name"> SOLR, the Enterprise Search server </field>

<Field name = "Manu"> Apache Software Foundation </field>

<Field name = "liangchuan"> liangchuan's SOLR "Hello, world" test </field>

<Field name = "url"> http://www.google.com </field>

</DOC>

</Add>

3) Submit an index request

CD apache-solr-1.2.0/example/exampledocs

    ./post.sh liangchuan.xml

4) Query

Query through SOLR administrator interface http: // localhost: 8080/SOLR/admin

Or pass the curl test:

       export URL="http://localhost:8080/solr/select/"

       curl "$URL?indent=on&q=liangchuan&fl=*,score"

5. SOLR query condition parameter description

Parameters	Description	Example
Q	Queries used for search in SOLR. You can append a semicolon and the name of the indexed field without breaking words to include the sorting information. The default sorting is score DESC, which means to sort scores in descending order.	Q = myfield: Java and otherfield: developerworks; Date ASC This query searches for two specified fields and sorts the results based on a date field.
Start	Specify the initial offset to the result set. It can be used to paging the results. The default value is 0.	Start = 15 Returns the results starting with 15th results.
Rows	The maximum number of returned documents. The default value is 10.	Rows = 25
FQ	Provides an optional filter query. The query result is restricted to only search for the results returned by the filter query. SOLR caches filtered queries. They are very useful for improving the speed of complex queries.	Any valid query that can be passed using the Q Parameter, except for sorting information.
Hl	When HL = true, the segments are highlighted in the query response. The default value is false. Refer to the SOLR wiki section on the highlighted parameters to view more options.	Hl = true
FL	The list separated by commas (,) specifies the field set to be returned in the document results. The default value is "*", indicating all fields. "Score" indicates that scores should also be returned.	*, Score

For more information about SOLR query parameters, see:

   http://wiki.apache.org/solr/CommonQueryParameters

The format of SOLR's query condition parameter q is the same as that of Lucene. For details, see:

Http://lucene.apache.org/java/docs/queryparsersyntax.html

6, SOLR usage mode in the portal community

To use SOLR in the portal community, use the following mode:

If the existing data of the original system or the data volume to be indexed is large

Using the HTTP Method to call the SOLR interface method, the efficiency is poor, using SOLR itself to CSV support (http://wiki.apache.org/solr/UpdateCSV)

), Export the data to the CSV format, and then call the solr csv interface http: // localhost: 8080/SOLR/update/CSV

Add data to the System

First, assemble the data to be indexed and queried into XML format, and then use httpclient to submit the data to the HTTP interface of SOLR, for example

Http: // localhost: 8080/SOLR/update

You can also refer to the implementation of simpleposttool in post. jar.

http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/util/SimplePostTool.java?view=co

Chinese Word Segmentation

Use Ding jieniu as the default Chinese Word Segmentation solution for SOLR (Lucene)

Project Library: http://code.google.com/p/paoding/

Google groups http://groups.google.com/group/paoding

Groups of javaeye: http://analysis.group.javaeye.com/

Integration with nutch

Http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html

Embedded SOLR

Http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer

Distributed Index

Http://wiki.apache.org/solr/CollectionDistribution

7. References

Http://wiki.apache.org/solr/

Http://www.ibm.com/developerworks/cn/java/j-solr1/

Http://www.ibm.com/developerworks/cn/java/j-solr2/

Http://www.xml.com/pub/a/2006/08/09/solr-indexing-xml-with-lucene-andrest.html? Page = 1

Http://lucene.apache.org/java/docs/queryparsersyntax.html

Http://www.blogjava.net/RongHao/archive/2007/11/06/158621.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More