Describe:
Implementing high-speed full-text indexing in Linux environment
First, the current environment:
CentOS (Linux) 6.3 bit
Second, the required software
1. Java JDK
2. SOLR's latest stable version Solr-4.5
3. Tomcat latest stable version Tomcat-7.0.42
4, IK Analyzer the latest stable version of the word breaker IKAnalyzer2012
Third, Tomcat installation
1. Installing the JDK
Yum-y Install JAVA-1.6.0-OPENJDK Java-1.6.0-openjdk-devel
2. Download Tomcat
Http://mirror.bit.edu.cn/apache/tomcat/tomcat-7/v7.0.42/bin/apache-tomcat-7.0.42.tar.gz
(If not found, please go to http://mirror.bit.edu.cn/official dot click "Apache" to select the appropriate version to download)
3. Unzip the Tomcat and place it in the specified directory
#tar ZXVF apache-tomcat-7.0.42.tar.gz
#mv apache-tomcat-7.0.42//usr/local/tomcat
4. Remove useless files under Tomcat to avoid security issues
#cd/usr/local/tomcat/webapps/
#rm-RF *
(If you want to test if the installation is successful, create a new root
Directory, put a index.html file below to test)
5. Start Tomcat
#/usr/local/tomcat/bin/startup.sh
Four, SOLR configuration
1. Download SOLR
Http://apache.fayea.com/apache-mirror/lucene/solr/4.5.0/solr-4.5.0.tgz
2. Unzip and configure SOLR
#tar zxvf solr-4.5.0.tgz#mkdir/home/solr# cd solr-4.5.0/#cp-R-P dist//home/solr/#cd SOLR-4.5.0/EXAMPLE#CP Webapps/sol r.war/usr/local/tomcat/webapps/#cd multicore/#rm-rf exampledocs/readme.txt #cp –rp */home/solr/ #cd. / #cd lib/ #cp */usr/local/tomcat/lib/(It is important to note that the ext directory must have the files inside the Tomcat Lib) #cp –r–p logs// home/solr/
3, in:/usr/local/tomcat/conf/catalina/localhost/under the new Solr.xml file, the contents are as follows:
<Contextdocbase= "/usr/local/tomcat/webapps/solr.war"Debug= "0"Privileged= "true"allowlinking= "true"Crosscontext= "true"><Environmentname= "Solr/home"type= "Java.lang.String"value= "/HOME/SOLR"Override= "true"></Environment></Context>
4, restart Tomcat, after restart/usr/local/tomcat/webapps/will be more than a SOLR directory,
Modify Web. XML under the SOLR directory
Install the contents of the current configuration changes, and modify the following:
<Env-entry> <Env-entry-name>Solr/home</Env-entry-name> <Env-entry-value>/home/solr</Env-entry-value> <Env-entry-type>Java.lang.String</Env-entry-type> </Env-entry>
Here basically SOLR and Tomcat have been combined to complete, then the following configuration participle.
V. Configuring the IK Analyzer Chinese word breaker
Here is the word breaker:
Common open-source word breakers have Cook looked through, mmseg4j, in addition to Ikanalyzer. Before the project has been mmseg4j, it has a drawback is the time to customize the thesaurus is more troublesome. Three IK Analyzer is selected because it is simple and efficient to customize a thesaurus.
1, download the word breaker
Https://code.google.com/p/ik-analyzer/downloads/list
The choice here is ikanalyzer2012_u1.zip (because there is a problem with the other versions of the new version in the test)
2. Unzip and place in the desired position
#unzip Ikanalyzer2012_u1.zip
After extracting only three files inside a jar a stopword.dic also has a configuration file IKAnalyzer.cfg.xml
A. Place the jar in Lib under the Web-inf of SOLR under Tomcat
#cp ikanalyzer2012ff_u1.jar/usr/local/tomcat/webapps/solr/web-inf/lib/
B. Create a directory under the Web-inf of SOLR under Tomcat classes
The Stopword.dic and configuration files are then IKAnalyzer.cfg.xml
C. Modify the profile Schema.xml in SOLR under Core to specify the fields of the required participle, for example:
<Fieldtypename= "Text_ik"class= "SOLR." TextField "> <Analyzertype= "Index"Ismaxwordlength= "false"class= "Org.wltea.analyzer.lucene.IKAnalyzer"/> <Analyzertype= "Query"Ismaxwordlength= "true"class= "Org.wltea.analyzer.lucene.IKAnalyzer"/> </FieldType>
It is important to note that this code needs to be added between <types></types>, and that the type specified by Name= "Text_ik" needs to be in the field below (specifically, you can search under "Solr schema.xml" Detailed configuration)
Note: Add type= "Text_ik" To fields that require a Chinese index
Reference: http://www.cnblogs.com/likehua/archive/2012/12/26/2834650.html
:
D. Reboot Tomcat and test
I take core0 as an example, (in the actual production environment, each core needs to modify the configuration file for the above configuration)
Vi. Custom Thesaurus (add words, remove excluded words)
In the actual production environment, it may be necessary to add some specific industry words, ikanalyzer configuration file can be a good solution to this problem.
1. Add Industry words
Open IKAnalyzer.cfg.xml and you will see that the configuration file is written very clearly, as long as the installation of Stopword.dic format custom a name, such as Xxx.dic placed in the current directory of the same level, and can be accessed by specifying it in the profile IKAnalyzer.cfg.xml. (It is important to note that the Thesaurus file encoding format must be UTF-8 without BOM header)
For example, I customized a call: Yanglei.dic, which wrote a word: Yang Lei, then the results of the participle is completely different.
(Below is the result of the word breaker before the custom add Word is configured)
(Word breaker results after configuring custom add-on words)
2. Add exclusion words
This is simple, direct editing stopword.dic and then the exclusion of the previous words to add it.
Reference:
The SOLR core configuration file (Schema.xml, Solrconfig.xml) configuration item Description Reference URL is as follows:
Http://www.blogjava.net/conans/articles/379545.html
Http://www.cnblogs.com/chenying99/archive/2012/04/19/2457195.html
Http://www.360doc.com/content/12/1122/10/11098634_249482489.shtml
Full-Text search engine SOLR configuration