1. Download The SOLR, mmseg4j word segmentation package, tomcat, and decompress the package, which can be searched by Google or Baidu.
2. To use Chinese word segmentation, you must set the encoding, enter the tomcat installation directory, and use VI to modify the confserver. xml configuration.
<Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" URIEncoding="UTF-8"/>
Added uriencoding = "UTF-8" to set encoding to UTF-8.
3. Copy APACHE-SOLR-*. war in the Dist folder under the downloaded SOLR package to Tomcat's webapps and change it to SOLR. War.
cp /opt/apache-solr-3.6.1/dist/apache-solr-3.6.1.war /opt/apache-tomcat-6.0.35/webapps/solr.war
4. Copy the SOLR file in the example folder under the downloaded SOLR package to the/OPT directory.
cp -r /opt/apache-solr-3.6.1/example/solr/ /opt
5. Configure the environment variable SOLR. Home, use VI to modify/etc/profile, add the following code, configure permanent variables, and restart reboot.
export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/opt/solr"
6. Start the Tomcat service and access http: // 127.0.0.1: 8080/SOLR/
The following page is displayed, indicating that the configuration is successful.
7. Configure the Chinese word segmentation, copy a jar package under the mmseg4j directory to SOLR/WEB-INF/lib/directory for SOLR to use, here you must run it first to generate SOLR folder
cp /opt/mmseg4j/mmseg4j-all-1.8.3.jar /opt/apache-tomcat-6.0.35/webapps/solr/WEB-INF/lib/
8. Configure the Chinese Dictionary
cp /opt/mmseg4j/data/words.dic /opt/solr/dic/
9. Modify the schema. XML (/opt/SOLR/CONF/Schema. XML) file to enable the word divider.
<!--mmseg4j field types--> <fieldType name="textComplex" class="solr.TextField" positionIncrementGap="100" > <analyzer> <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="/opt/solr/dic"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <fieldType name="textMaxWord" class="solr.TextField" positionIncrementGap="100" > <analyzer> <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word" dicPath="/opt/solr/dic"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <fieldType name="textSimple" class="solr.TextField" positionIncrementGap="100" > <analyzer> <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple" dicPath="/opt/solr/dic"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
Add
<!--mmseg4j copyField--> <copyField source="simple" dest="text"/> <copyField source="complex" dest="text"/>
10, http: // 127.0.0.1: 8080/SOLR/admin/analysis. jsp
Click analyze to view the word splitting effect.
11. Configure the MySQL database
First import the jar package mysql-connector-java-5.1.7-bin.jar that connects to MySQL
I put it in the lib directory of Tomcat
touch /opt/solr/conf/mysql.xml
Write the following content into mysql. xml
<?xml version="1.0" encoding="UTF-8" ?> <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://10.10.150.116/travel_main" user="new_travel_u" password="123045"/> <document name="user_core"> <entity name="user_core" query="select * from user_core" pk="userId" deltaQuery ="select userId from user_core where editTime > '${dataimporter.last_index_time}'"> <field column="userId" name="id" /> <field column="nickname" name="nickname" /> </entity> </document></dataConfig>
<Fields> of solrconfig. xml must have a child element corresponding to the database field. Here I add a new nickname
<field name="nickname" type="string" indexed="true" stored="true" />
Configure the solrconfig. xml file and add a node under the <config> node.
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">/opt/solr/conf/mysql.xml</str> </lst></requestHandler>
Modify the path of the <lib> label. Otherwise, the corresponding jar package may not be found.
<lib dir="/opt/apache-solr-3.6.1/dist/" regex="apache-solr-cell-\d.*\.jar" /><lib dir="/opt/apache-solr-3.6.1/contrib/extraction/lib" regex=".*\.jar" /><lib dir="/opt/apache-solr-3.6.1/dist/" regex="apache-solr-clustering-\d.*\.jar" /><lib dir="/opt/apache-solr-3.6.1/contrib/clustering/lib/" regex=".*\.jar" /><lib dir="/opt/apache-solr-3.6.1/dist/" regex="apache-solr-dataimporthandler-\d.*\.jar" /><lib dir="/opt/apache-solr-3.6.1/contrib/dataimporthandler/lib/" regex=".*\.jar" /><lib dir="/opt/apache-solr-3.6.1/dist/" regex="apache-solr-langid-\d.*\.jar" /><lib dir="/opt/apache-solr-3.6.1/contrib/langid/lib/" regex=".*\.jar" /><lib dir="/opt/apache-solr-3.6.1/dist/" regex="apache-solr-velocity-\d.*\.jar" /><lib dir="/opt/apache-solr-3.6.1/contrib/velocity/lib" regex=".*\.jar" />
12. Restart Tomcat
/opt/apache-tomcat-6.0.35/bin/shutdown.sh/opt/apache-tomcat-6.0.35/bin/startup.sh
13. Enter the URL in the browser
Http: // 127.0.0.1: 8080/SOLR/dataimport? Command = Full-Import
Add a full index and enter
Http: // 127.0.0.1: 8080/SOLR/admin/
Query results
Incremental indexes can be added at regular scheduling.
Http: // 127.0.0.1: 8080/SOLR/dataimport? Command = delta-Import
This article from http://my.oschina.net/eatsuger/blog/82192? From = 20121014