As the search engine function has a large number of functional requirements that require search engines in the portal community to improve user experience, there are currently a centralized solution for implementing search engines:
- Implement intra-site search by using Lucene's own encapsulation.
Large workload and scalability, not used.
- Call APIs of Google and Baidu to implement intra-Site Search
It is too dead to bind with a third-party search engine to meet the business expansion needs in the future.
- Intra-site search based on compass + Lucene
It is suitable for indexing database-driven application data, especially replacing the traditional like '% expression %' to index fields such as varchar and clob, it is a worthwhile solution to implement intra-site search. However, you still need to encapsulate distributed processing and interface encapsulation to a certain extent.
- Implement intra-site search based on SOLR
This solution provides complete solutions for better encapsulation and scalability. Therefore, this solution is used in the portal community and later added to the compass solution.
1,SOLRIntroductionSOLR is a Lucene-based Java search engine server. SOLR provides hierarchical search, eye-catching hit display, and multiple output formats (including XML/XSLT and JSON ). It is easy to install and configure, and comes with an HTTP-based management interface. SOLR has been used in many large websites and is relatively mature and stable. SOLR encapsulates and extends Lucene, so SOLR basically follows the related terms of Lucene. More importantly, the index created by SOLR is fully compatible with the Lucene search engine library. By configuring SOLR appropriately, encoding may be required in some cases. SOLR can read and use indexes built into other Lucene applications. In addition, many Lucene tools (such as nutch and Luke) can also use the index created by SOLR.
2,TomcatInstall and configure SOLRSOLR is developed based on Java, so SOLR can be deployed and used in both Windows and Linux. However, SOLR provides shell scripts for testing, management, and maintenance, therefore, we recommend that you install it on Linux during production and deployment, and use it on windows during testing.
The following describes how to install and configure SOLR in Linux. Windows is similar to this.
Wget http://apache.mirror.phpchina.com/tomcat/tomcat-6/v6.0.16/bin/apache-tomcat-6.0.16.zip
Unzip apache-tomcat-6.0.16.zip
Mv apache-Tomcat-6.0.16/opt/tomcat
Chmod 755/opt/tomcat/bin /*
Wget http://apache.mirror.phpchina.com/lucene/solr/1.2/apache-solr-1.2.0.tgz
Tar zxvf apache-solr-1.2.0.tgz
The most troublesome installation and configuration of SOLR. SOLR. Home is the understanding and configuration of SOLR.
- Based on the current path
CP apache-solr-1.2.0/Dist/apache-solr-1.2.0.war/opt/tomcat/webapps/SOLR. War
Mkdir/opt/SOLR-Tomcat
CP-r apache-solr-1.2.0/example/SOLR // opt/SOLR-Tomcat/
CD/opt/SOLR-Tomcat
/Opt/tomcat/bin/startup. Sh
In this case (SOLR. SOLR. Home Environment Variable or JNDI is not set), SOLR searches for./SOLR, so you need to switch to/opt/SOLR-Tomcat at startup.
- Environment VariablesSOLR. SOLR. Home
Add the following environment variables to the current user's environment variables (. bash_profile) or/opt/tomcat/Catalina. sh:
Export java_opts = "$ java_opts-dsolr. SOLR. Home =/opt/SOLR-Tomcat/SOLR"
- Configuration Based on JNDI
Mkdir-P/opt/tomcat/CONF/Catalina/localhost
Touch/opt/tomcat/CONF/Catalina/localhost/SOLR. XML, the content is as follows:
<Context docBase="/opt/tomcat/webapps/solr.war" debug="0" crossContext="true" >
<Environment name="solr/home" type="java.lang.String" value="/opt/solr-tomcat/solr" override="true" />
</Context>
Access the SOLR Management Interface
3,SOLRPrinciple
SOLR provides standard HTTP interfaces to add, delete, modify, and query data indexes. In SOLR, you can start indexing and searching by sending an HTTP request to the SOLR web application deployed in the servlet container. SOLR accepts the request, determines the appropriate solrrequesthandler to be used, and then processes the request. Return the response in the same way as HTTP. The standard XML response of SOLR is returned by default. You can also configure the backup response format of SOLR.
You can send four different INDEX requests to the SOLR index servlet:
- Add/update allows you to add or update documents to SOLR. These additions and updates cannot be found until they are submitted.
- Commit tells SOLR that all changes made since the last submission can be searched.
- Optimize restructured Lucene files to improve search performance. It is usually better to perform optimization after the index is complete. If updates are frequent, you should optimize them when the usage is low. An index can run normally without optimization. Optimization is a time-consuming process.
- Delete can be specified by ID or query. Deleting by ID will delete documents with the specified ID. Deleting by query will delete all documents returned by the query.
A typical add Request Message
<Add>
<Doc>
<Field name = "ID"> TWINX2048-3200PRO </field>
<Field name = "name"> Corsair XMS 2 GB (2x1 GB) 184-pin ddr sdram unbuffered DDR 400 (PC 3200) dual Channel KIT system memory-Retail </field>
<Field name = "Manu"> Corsair microsystems Inc. </field>
<Field name = "cat"> electronics </field>
<Field name = "cat"> memory </field>
<Field name = "Features"> CAS latency 2, 2-3-3-6 Timing, 2.75 V, unbuffered, heat-spreader </field>
<Field name = "price"> 185 </field>
<Field name = "popularity"> 5 </field>
<Field name = "instock"> true </field>
</DOC>
<Doc>
<Field name = "ID"> vs1gb400c3 </field>
<Field name = "name"> Corsair valueselect 1 GB 184-pin ddr sdram unbuffered DDR 400 (PC 3200) system memory-Retail </field>
<Field name = "Manu"> Corsair microsystems Inc. </field>
<Field name = "cat"> electronics </field>
<Field name = "cat"> memory </field>
<Field name = "price"> 74.99 </field>
<Field name = "popularity"> 7 </field>
<Field name = "instock"> true </field>
</DOC>
</Add>
A typical search result message:
<Response>
<Lst name = "responseheader">
<Int name = "status"> 0 </int>
<Int name = "qtime"> 6 </int>
<Lst name = "Params">
<STR name = "rows"> 10 </STR>
<STR name = "start"> 0 </STR>
<STR name = "FL"> *, score </STR>
<STR name = "Hl"> true </STR>
<STR name = "Q"> content: "Faceted browsing" </STR>
</Lst>
</Lst>
<Result name = "response" numfound = "1" Start = "0" maxscore = "1.058217">
<Doc>
<Float name = "score"> 1.058217 </float>
<Arr name = "all">
<STR> http: // localhost/myblog/solr-rocks-again.html </STR>
<STR> SOLR is great </STR>
<STR> SOLR, Lucene, enterprise, search, greatness </STR>
<STR> SOLR has some really great features, like faceted browsing
And replication </STR>
</ARR>
<Arr name = "content">
<STR> SOLR has some really great features, like faceted browsing
And replication </STR>
</ARR>
<Date name = "creationdate"> 2007-01-07t05: 04: 00.000z </date>
<Arr name = "keywords">
<STR> SOLR, Lucene, enterprise, search, greatness </STR>
</ARR>
<Int name = "rating"> 8 </int>
<STR name = "title"> SOLR is great </STR>
<STR name = "url"> http: // localhost/myblog/solr-rocks-again.html </STR>
</DOC>
</Result>
<Lst name = "Highlighting">
<Lst name = "http: // localhost/myblog/solr-rocks-again.html">
<Arr name = "content">
<STR> SOLR has some really great features, like <em> faceted </em>
<Em> browsing </em> and replication </STR>
</ARR>
</Lst>
</Lst>
</Response>
For more information about SOLR, see
Http://wiki.apache.org/solr/FrontPage
4,
SOLR
Test use
The SOLR installation package contains the relevant test sample path in apache-solr-1.2.0/example/exampledocs
- Test SOLR using shell script (curl:
CD apache-solr-1.2.0/example/exampledocs
VI post. sh: Modify the URL variable value url = http: // localhost: 8080/SOLR/update based on Tomcat's IP address and port
./post.sh *.xml #
- Test SOLR using SOLR's Java package:
View help: Java-jar post. jar-help
Submit test data:
Java-durl = http: // localhost: 8080/SOLR/update-dData = files-jar post. jar *. xml
The following uses liangchuan and URL as examples to describe how to use index commands in solr.
1) modify the SOLR schema and configure the description of the index fields:
VI/opt/SOLR-Tomcat/SOLR/CONF/Schema. XML, add the following content in <fields>:
<Field name = "liangchuan" type = "string" indexed = "true" stored = "true"/>
<Field name = "url" type = "string" indexed = "true" stored = "true"/>
2) create an XML test file for adding an index request
Touch/root/apache-solr-1.2.0/example/exampledocs/liangchuan. XML, the content is as follows:
<Add>
<Doc>
<Field name = "ID"> liangchuan000 </field>
<Field name = "name"> SOLR, the Enterprise Search server </field>
<Field name = "Manu"> Apache Software Foundation </field>
<Field name = "liangchuan"> liangchuan's SOLR "Hello, world" test </field>
<Field name = "url"> http://www.google.com </field>
</DOC>
</Add>
3) Submit an index request
CD apache-solr-1.2.0/example/exampledocs
./post.sh liangchuan.xml
4) Query
Query through SOLR administrator interface http: // localhost: 8080/SOLR/admin
Or pass the curl test:
export URL="http://localhost:8080/solr/select/"
curl "$URL?indent=on&q=liangchuan&fl=*,score"
5. SOLR query condition parameter description
Parameters |
Description |
Example |
Q |
Queries used for search in SOLR. You can append a semicolon and the name of the indexed field without breaking words to include the sorting information. The default sorting is score DESC, which means to sort scores in descending order. |
Q = myfield: Java and otherfield: developerworks; Date ASC This query searches for two specified fields and sorts the results based on a date field. |
Start |
Specify the initial offset to the result set. It can be used to paging the results. The default value is 0. |
Start = 15 Returns the results starting with 15th results. |
Rows |
The maximum number of returned documents. The default value is 10. |
Rows = 25 |
FQ |
Provides an optional filter query. The query result is restricted to only search for the results returned by the filter query. SOLR caches filtered queries. They are very useful for improving the speed of complex queries. |
Any valid query that can be passed using the Q Parameter, except for sorting information. |
Hl |
When HL = true, the segments are highlighted in the query response. The default value is false. Refer to the SOLR wiki section on the highlighted parameters to view more options. |
Hl = true |
FL |
The list separated by commas (,) specifies the field set to be returned in the document results. The default value is "*", indicating all fields. "Score" indicates that scores should also be returned. |
*, Score |
For more information about SOLR query parameters, see:
http://wiki.apache.org/solr/CommonQueryParameters
The format of SOLR's query condition parameter q is the same as that of Lucene. For details, see:
Http://lucene.apache.org/java/docs/queryparsersyntax.html
6,
SOLR usage mode in the portal community
To use SOLR in the portal community, use the following mode:
- If the existing data of the original system or the data volume to be indexed is large
Using the HTTP Method to call the SOLR interface method, the efficiency is poor, using SOLR itself to CSV support (http://wiki.apache.org/solr/UpdateCSV)
), Export the data to the CSV format, and then call the solr csv interface http: // localhost: 8080/SOLR/update/CSV
First, assemble the data to be indexed and queried into XML format, and then use httpclient to submit the data to the HTTP interface of SOLR, for example
Http: // localhost: 8080/SOLR/update
You can also refer to the implementation of simpleposttool in post. jar.
http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/util/SimplePostTool.java?view=co
- Chinese Word Segmentation
Use Ding jieniu as the default Chinese Word Segmentation solution for SOLR (Lucene)
Project Library: http://code.google.com/p/paoding/
Google groups http://groups.google.com/group/paoding
Groups of javaeye: http://analysis.group.javaeye.com/
Http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html
Http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer
Http://wiki.apache.org/solr/CollectionDistribution
7. References
Http://wiki.apache.org/solr/
Http://www.ibm.com/developerworks/cn/java/j-solr1/
Http://www.ibm.com/developerworks/cn/java/j-solr2/
Http://www.xml.com/pub/a/2006/08/09/solr-indexing-xml-with-lucene-andrest.html? Page = 1
Http://lucene.apache.org/java/docs/queryparsersyntax.html
Http://www.blogjava.net/RongHao/archive/2007/11/06/158621.html