Full-Text search engine SOLR configuration

Source: Internet
Author: User
Tags solr

Describe:

Implementing high-speed full-text indexing in Linux environment

First, the current environment:

CentOS (Linux) 6.3 bit

Second, the required software

1. Java JDK

2. SOLR's latest stable version Solr-4.5

3. Tomcat latest stable version Tomcat-7.0.42

4, IK Analyzer the latest stable version of the word breaker IKAnalyzer2012

Third, Tomcat installation

1. Installing the JDK

Yum-y Install JAVA-1.6.0-OPENJDK Java-1.6.0-openjdk-devel

2. Download Tomcat

Http://mirror.bit.edu.cn/apache/tomcat/tomcat-7/v7.0.42/bin/apache-tomcat-7.0.42.tar.gz

(If not found, please go to http://mirror.bit.edu.cn/official dot click "Apache" to select the appropriate version to download)

3. Unzip the Tomcat and place it in the specified directory

#tar ZXVF apache-tomcat-7.0.42.tar.gz

#mv apache-tomcat-7.0.42//usr/local/tomcat

4. Remove useless files under Tomcat to avoid security issues

#cd/usr/local/tomcat/webapps/

#rm-RF *

(If you want to test if the installation is successful, create a new root

Directory, put a index.html file below to test)

5. Start Tomcat

#/usr/local/tomcat/bin/startup.sh

Four, SOLR configuration

1. Download SOLR

Http://apache.fayea.com/apache-mirror/lucene/solr/4.5.0/solr-4.5.0.tgz

2. Unzip and configure SOLR

#tar zxvf solr-4.5.0.tgz#mkdir/home/solr# cd solr-4.5.0/#cp-R-P dist//home/solr/#cd SOLR-4.5.0/EXAMPLE#CP Webapps/sol r.war/usr/local/tomcat/webapps/#cd multicore/#rm-rf exampledocs/readme.txt #cp –rp */home/solr/  #cd. /   #cd lib/   #cp */usr/local/tomcat/lib/(It is important to note that the  ext directory must have the files inside the Tomcat Lib)  #cp –r–p logs// home/solr/

3, in:/usr/local/tomcat/conf/catalina/localhost/under the new Solr.xml file, the contents are as follows:

<Contextdocbase= "/usr/local/tomcat/webapps/solr.war"Debug= "0"Privileged= "true"allowlinking= "true"Crosscontext= "true"><Environmentname= "Solr/home"type= "Java.lang.String"value= "/HOME/SOLR"Override= "true"></Environment></Context>

4, restart Tomcat, after restart/usr/local/tomcat/webapps/will be more than a SOLR directory,

Modify Web. XML under the SOLR directory

Install the contents of the current configuration changes, and modify the following:

 <Env-entry>          <Env-entry-name>Solr/home</Env-entry-name>          <Env-entry-value>/home/solr</Env-entry-value>          <Env-entry-type>Java.lang.String</Env-entry-type>    </Env-entry>

Here basically SOLR and Tomcat have been combined to complete, then the following configuration participle.

V. Configuring the IK Analyzer Chinese word breaker

Here is the word breaker:

Common open-source word breakers have Cook looked through, mmseg4j, in addition to Ikanalyzer. Before the project has been mmseg4j, it has a drawback is the time to customize the thesaurus is more troublesome. Three IK Analyzer is selected because it is simple and efficient to customize a thesaurus.

1, download the word breaker

Https://code.google.com/p/ik-analyzer/downloads/list

The choice here is ikanalyzer2012_u1.zip (because there is a problem with the other versions of the new version in the test)

2. Unzip and place in the desired position

#unzip Ikanalyzer2012_u1.zip

After extracting only three files inside a jar a stopword.dic also has a configuration file IKAnalyzer.cfg.xml

A. Place the jar in Lib under the Web-inf of SOLR under Tomcat

#cp ikanalyzer2012ff_u1.jar/usr/local/tomcat/webapps/solr/web-inf/lib/

B. Create a directory under the Web-inf of SOLR under Tomcat classes

The Stopword.dic and configuration files are then IKAnalyzer.cfg.xml

C. Modify the profile Schema.xml in SOLR under Core to specify the fields of the required participle, for example:

<Fieldtypename= "Text_ik"class= "SOLR." TextField ">       <Analyzertype= "Index"Ismaxwordlength= "false"class= "Org.wltea.analyzer.lucene.IKAnalyzer"/>       <Analyzertype= "Query"Ismaxwordlength= "true"class= "Org.wltea.analyzer.lucene.IKAnalyzer"/>   </FieldType>

It is important to note that this code needs to be added between <types></types>, and that the type specified by Name= "Text_ik" needs to be in the field below (specifically, you can search under "Solr schema.xml" Detailed configuration)

Note: Add type= "Text_ik" To fields that require a Chinese index

Reference: http://www.cnblogs.com/likehua/archive/2012/12/26/2834650.html

D. Reboot Tomcat and test

I take core0 as an example, (in the actual production environment, each core needs to modify the configuration file for the above configuration)

Vi. Custom Thesaurus (add words, remove excluded words)

In the actual production environment, it may be necessary to add some specific industry words, ikanalyzer configuration file can be a good solution to this problem.

1. Add Industry words

Open IKAnalyzer.cfg.xml and you will see that the configuration file is written very clearly, as long as the installation of Stopword.dic format custom a name, such as Xxx.dic placed in the current directory of the same level, and can be accessed by specifying it in the profile IKAnalyzer.cfg.xml. (It is important to note that the Thesaurus file encoding format must be UTF-8 without BOM header)

For example, I customized a call: Yanglei.dic, which wrote a word: Yang Lei, then the results of the participle is completely different.

(Below is the result of the word breaker before the custom add Word is configured)

(Word breaker results after configuring custom add-on words)

2. Add exclusion words

This is simple, direct editing stopword.dic and then the exclusion of the previous words to add it.

Reference:

The SOLR core configuration file (Schema.xml, Solrconfig.xml) configuration item Description Reference URL is as follows:

Http://www.blogjava.net/conans/articles/379545.html

Http://www.cnblogs.com/chenying99/archive/2012/04/19/2457195.html

Http://www.360doc.com/content/12/1122/10/11098634_249482489.shtml

Full-Text search engine SOLR configuration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.