SOLR Research Summary (very detailed and comprehensive)

Source: Internet
Author: User
Tags solr

SOLR Research Summary

Development type

Full-Text Search related development

SOLR version

4.2

File contents

This paper introduces the functions and precautions of SOLR, including the following: Setting up and commissioning of the environment, the introduction of two core profiles, maintaining indexes, querying indexes, and the use of highlighting, spell checking, search suggestions, grouping statistics, pinyin retrieval and other functions that can be applied in queries.

Version

Author/modified Person

Date

V1.0

Gzk

2013-06-04

1. What is SOLR.

SOLR It is an open source, Lucene Java-based search server that is easy to add to a WEB application. SOLR provides a level of search (that is, statistics), hit highlighting, and supports multiple output formats (including XML/XSLT and JSON formats). It is easy to install and configure, and comes with an HTTP-based management interface. You can use SOLR's excellent basic search functionality, or expand it to meet the needs of your business. The features of SOLR include: Advanced full-text search capabilities optimized for high throughput network traffic standard integrated HTML management interface scalability based on open interfaces (XML and HTTP)-can be effectively replicated to another SOLR search server using XML configuration for flexibility and compatibility Extensible plug-in System 2. What Lucene is.

Lucene is a Java-based full-text information Retrieval toolkit, which is not a complete search application, but rather provides indexing and search capabilities for your application. Lucene is currently an open source project in the Apache Jakarta (Jakarta) family. It is also the most popular open source full-Text Search toolkit based on Java. There are already many applications that are based on Lucene, such as the Eclipse Help system search function. Lucene has the ability to index text-type data, so you can index and search your documents as long as you convert the format of the data you want to index into a text format. 3. SOLR vs Lucene

SOLR and Lucene are not competing against each other, while SOLR relies on Lucene because the core technology of SOLR is implemented using Lucene, and the essential difference between SOLR and Lucene is the following three points: Search server, enterprise level and management. Lucene is essentially a search library, not a standalone application, while SOLR is. Lucene focuses on the search for the underlying building, while SOLR focuses on enterprise applications. Lucene is not responsible for the management necessary to support the search service, and SOLR is responsible. So, in a nutshell, SOLR:SOLR is the expansion of Lucene for enterprise search applications.

SOLR and Lucene frame composition:

SOLR uses Lucene and expands it. A true data schema with dynamic field and unique key (the unique key) is a powerful extension of the Lucene query language. Supports dynamic grouping and filtering of results advanced, configurable text analysis highly configurable and scalable caching mechanism performance optimizations support external configuration via XML support for high-speed incremental updates (fast incremental updates) and snapshot publishing with a log that can be monitored by an administrative interface (Snapshot distribution) 4. Build and Debug Solr 4.1 Installing a virtual machine

SOLR must be running in Java1.6 or later Java virtual machines, and running the standard SOLR service requires only installing the JRE, but if you need to extend the functionality or compile the source, you need to download the JDK to complete. You can download the required JDK or JRE:OPENJDK (http://java.sun.com/j2se/downloads.html) Sun (http://java.sun.com/j2se/) from the address below downloads.html) IBM (http://www.ibm.com/developerworks/java/jdk/) Oracle (http://www.oracle.com/technology/ products/jrockit/index.html)

Please refer to the appropriate Help documentation for the installation procedure. 4.2 Downloads Solr

This article is for the Solr4.2 version of the research, the following description is for the Solr4.2 version, such as the latest version of SOLR, please take the official website content. SOLR official website Download address: http://lucene.apache.org/solr/ 4.3 download and set up Apache Ant

SOLR is a source of management using ant, Ant is a Java-based build tool. In theory, it's somewhat similar to Maven or make in C. After the download is extracted, the environment variable is set.

ant_home:e:\work\apache-ant\1.9.1 (This is the directory you unzipped yourself) Path:%ant_home%\bin (this setting is for easy operation in DOS environment)

To see if the installation was successful, enter the command in the command-line window ant, if the result appears:



Indicates that the ant installation was successful. Because ant runs the Build.xml file by default, this file needs to be created by us. Now it's time to build the SOLR source code. Enter your SOLR source directory in the command line window and input ant will appear with the current Build.xml usage prompt.

The rest doesn't have to do with it, we just build it against the IDE we use, and if you use Eclipse, type it on the command line: Ant Eclipse. If you use IntelliJ idea, enter it at the command line: Ant idea. This will make it possible to build.

The black window prompts this ...

Failed... Why, in the end, I found out that one of the missing jars in the downloaded ant is this apache-ivy (download address: http://ant.apache.org/ivy/) That's a strange name. Ivy is the ant management jar dependency. When the first bulid, Ivy automatically downloads the missing dependencies in the build. The first build of slow speed is a long time ...

Download a jar and put the jar in Ant's Lib (E:\Work\apache-ant\1.9.1\lib) to run ant again and it will succeed. The code debugging of SOLR can only be done by now. 4.4 Configuring and running the SOLR code

No matter what the IDE prefers, set the SOLR home in the IDE's JVM parameter settings vm arguments write-DSOLR.SOLR.HOME=SOLR/EXAMPLE/SOLR generally. You can also use an absolute path.

SOLR uses the Startsolrjetty file as a portal file for debugging code, where you can set the port used by the server and the WebApps directory of SOLR. Generally not set, the default can be debugged.  SOLR home can be set up in code as well. System.setproperty ("Solr.solr.home", "E:\\WORK\\SOLR-4.2.0-SRC-IDEA\\SOLR\\EXAMPLE\\SOLR");

Currently, a example is used as the root of the SOLR configuration, and if you have additional SOLR configuration directories, you can set it up. Click Run and debug is also available. No other problem should be able to run. Note the port used by the servlet container, such as the check prompt:

FAILED socketconnector@0.0.0.0:8983:java.net.bindexception:address already in Use:jvm_bind Indicates that the current port is in use. You can change it. If no error has been successfully started, you can enter the address in the browser: http://localhost:8983/solr/can see the following interface

Here, SOLR has been successfully configured and run. If you want to debug the code at startup, you can initializer the Initialize () by breaking the breakpoint in this method. method if you want to find a breakpoint debugging from the browser to the Solrdispatchfilter Dofilter method midpoint breakpoint.

Note: IE9 has a bug in compatibility mode and must be set to non-compatibility mode. 5.SOLR Foundation

Because SOLR packages and expands Lucene, they use many of the same terminology. More importantly, the index created by SOLR is fully compatible with the Lucene search engine library. By properly configuring SOLR, some situations may require coding, and SOLR can read and use indexes built into other Lucene applications. In SOLR and Lucene, you use one or more Document to build the index. Document includes one or more Field fields. Field includes the name, content, and metadata that tells SOLR how to handle the content.

For example, Field can contain a string, a number, a Boolean value, or a date, or any type you want to add, simply configure it in the SOLR configuration file. Field can be described using a number of options that tell Solr what to do with the content during indexing and searching.

Now, take a look at a subset of the important attributes listed in Table 1:

Property name

Describe

Indexed

Indexed Field can be searched and sorted. You can also run the SOLR analysis process on the indexed Field, which modifies the content to improve or change the results.

Stored

Stored Field content is saved in the index. This is useful for retrieving and highlighting content, but is not required for actual searches. For example, many applications store pointers to content locations rather than storing the actual file contents.

5.1 mode configuration Schema.xml

Schema.xml This configuration file can be found in the \solr\example\solr\collection1\conf that you download the SOLR package, which is the file associated with SOLR mode. Open this configuration file and you will find a detailed comment. The pattern organization is divided into three important configuration 5.1.1. Types Part

is a common reusable definition that defines how SOLR (and Lucene) handles Field. That is, the type that is added to the XML file attribute in the index, such as int, text, date, and so on.

<fieldtype name= "string" class= "SOLR. Strfield "sortmissinglast=" true "/>

<fieldtype name= "boolean" class= "SOLR. Boolfield "sortmissinglast=" true "/>

<fieldtype name= "int" class= "SOLR. Trieintfield "precisionstep=" 0 "positionincrementgap=" 0 "/>

<fieldtype name= "text_general" class= "SOLR. TextField "positionincrementgap=" >

<analyzer type= "index" >

  <tokenizer class= " Solr. Standardtokenizerfactory "/>

  <filter class=" SOLR. Stopfilterfactory "ignorecase=" true "words=" Stopwords.txt "enablepositionincrements=" true "/>

  < Filter Class= "SOLR. Lowercasefilterfactory "/>

</analyzer>

<analyzer type=" Query ">

  < Tokenizer class= "SOLR. Standardtokenizerfactory "/>

  <filter class=" SOLR. Stopfilterfactory "ignorecase=" true "words=" Stopwords.txt "enablepositionincrements=" true "/>

  < Filter Class= "SOLR. Synonymfilterfactory "synonyms=" Synonyms.txt "ignorecase=" true "expand=" true "/>

  <filter class=" Solr. Lowercasefilterfactory "/>

</analyzer>

</fieldType>

Parameter description:

Property

Describe

Name

Logo.

Class

and other attributes determine the actual behavior of this fieldtype.

Sortmissinglast

Set to True if no data for the field is queued after the data for that field, regardless of the collation at the time of the request, the default is set to False.

Sortmissingfirst

Follow the top upside down. Default is set to False

Analyzer

Word breaker specified by field type

Type

The operation used for the current word breaker. Index represents the word breaker that is used when the index is built. The word breaker that the query code uses when querying

Tokenizer

Word Breaker class

Filter

The filter filters that are applied after the word breaker are called in the same order and configuration.

5.1.2. Fileds

is the name of the property that you add to the index file, and the declaration type needs to use the types

<field name= "id" type= "string" indexed= "true" stored= "true" required= "true" multivalued= "false"/>

<field name= "path" type= "TEXT_SMARTCN" indexed= "false" stored= "true" multivalued= "false" termvector= "true"/>

<field name= "Content" type= "TEXT_SMARTCN" indexed= "false" stored= "true" multivalued= "false" termvector= "true"/ >

<field name = "text" type = "Text_ik" indexed = "true" stored = "false" multivalued = "true"/>

<field name = "Pinyin" type = "text_pinyin" indexed = "true" stored = "false" multivalued = "false"/>

<field name= "_version_" type= "Long" indexed= "true" stored= "true"/>

<dynamicfield name= "*_i" type= "int" indexed= "true" stored= "true"/>

<dynamicfield name= "*_l" type= "Long" indexed= "true" stored= "true"/>

<dynamicfield name= "*_s" type= "string" indexed= "true" stored= "true"/>

Field: Fixed field settings dynamicField: Dynamic field settings for post-custom fields, * wildcard characters. For example, Test_i is a dynamic field of type int.

There is also a special field Copyfield, commonly used for retrieving fields so that only one field is indexed word breaker copyfield dest field if there is more than one source must be set multivalued=true, otherwise it will be an error

<copyfield source= "Content" dest= "Pinyin"/>

<copyfield source= "Content" dest= "text"/>

<copyfield source= "Pinyin" dest= "text"/>

Field Property Description:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.