Configuring and using ANSJ participle in SOLR

Source: Internet
Author: User
Tags solr git client

In the previous section, "compiling the SOLR plugin for ANSJ," describes how to compile the interfaces used in the ANSJ word breaker in the SOLR (lucene) environment, this chapter describes how to use ANSJ in SOLR, with steps that include downloading or compiling jar packages such as ANSJ and Nlp-lang, Configure the correlation types in the schema, configure jar packages such as ANSJ and Nlp-lang to SOLR, and test the ansj participle effect.

First, download or compile the jar packages such as ansj-seg and Nlp-lang.

1, you can go to http://maven.ansj.org/org/ansj/ansj_seg/| Download the relevant jar package in http://maven.ansj.org/org/nlpcn/.

Ansj-seg The associated jar package as shown in:

    

Nlp-lang is a tool class related to natural language processing in ansj-seg participle, with powerful features:

   

2. Download the relevant source code and compile it yourself.

This is relatively complex, but if used for a long time, this is necessary. For this kind of excellent participle, we need to do a good research.

GitHub Address: Https://github.com/NLPchina/ansj_seg

Git client Address: http://git-scm.com/download/

git download source command: Git clone https://github.com/NLPchina/ansj_seg.git

The file structure after download is as follows:

   

The visible code is managed in the MAVEN group. The installation configuration for MAVEN is rough with the old article, mainly including:

Download maven related packages, unzip:

Configuring Environment Variables m2_home:c:\apache-maven-3.2.1

Configuring the PATHB environment variable:%m2_home%\bin;

MVN often commands: mvn clean install# cleans up local caches, downloads dependent jar packages can add-dskiptests=true ignore unit tests; mvn Eclipse:clean #清理mvn生成的eclipse工程; mvn Eclipse:eclipse #根据pom. xml to generate the Eclipse project.

Steps:

Execute under the source root path: mvn clean install-dskiptests=true command to generate the jar package in the target directory.

    

Target directory:

    

Synonymous with the truth, can be compiled Nlp-lang jar package, Address: Https://github.com/NLPchina/nlp-lang

Second, configure the Ansj field type in the Solr schema.xml.

1. Create the ANSJ type.

Locate Schema.xml, add ANSJ type TEXT_ANSJ:

<!--ansj start--><fieldtype name= "Text_ansj" class= "SOLR. TextField "positionincrementgap=" >    <analyzer type= "index" >         <tokenizer class= " Org.ansj.solr.AnsjTokenizerFactory "  isquery=" false "/>    </analyzer>    <analyzer type=" Query ">        <tokenizer class=" org.ansj.solr.AnsjTokenizerFactory "/>    </analyzer></fieldType> <!--ANSJ End--

Org.ansj.solr.AnsjTokenizerFactory is the Ansj-lucene plugin we compiled.

2. Configure the fields that need to be indexed.

   <!--ansj_test Field--<field name= "poi_oid" type= "string" indexed= "false" stored= "true"/> <field Name= "Poi_name" type= "TEXT_ANSJ" indexed= "true" stored= "false"/> <field name= "poi_name_suggest" type= "string" Indexed= "false" stored= "true"/> <field name= "poi_address" type= "TEXT_ANSJ" indexed= "true" stored= "false"/> & Lt;field name= "Poi_address_suggest" type= "string" indexed= "false" stored= "true"/> <field name= "Poi_phone" type= "String" indexed= "true" stored= "true"/> <field name= "Poi_type" type= "string" indexed= "true" stored= "true" Multivalued= "true"/> <field name= "Poi_url" type= "string" indexed= "false" stored= "true"/> <field name= "POI _dianping "type=" string "indexed=" true "stored=" true "/> <field name=" Poi_brand "type=" string "indexed=" true "sto Red= "true"/> <field name= "poi_city" type= "string" indexed= "true" stored= "true" multivalued= "true"/> <fiel D name= "Poi_tag" type= "TEXT_ANSJ" indexed= "True "stored=" true "/> <field name=" Poi_lat "type=" Double "indexed=" false "stored=" true "/> <field name=" POI _lon "type=" Double "indexed=" false "stored=" true "/> <field name=" Poi_data_type "type=" string "indexed=" true " Stored= "false"/>

Third, configure the ANSJ in the SOLR environment.

Put the compiled ansj-seg, Nlp-lang, Ansj_lucene4_plug into the Solr War package lib.

  

  

Configure the ANSJ related thesaurus and configuration files in the ANSJ source directory:

  

Put these three configuration files into the SOLR program Web-inf/classes directory, the classes directory does not exist and is created manually.

    

Iv. test ansj participle effect.

Once the ANSJ is configured, start the tomcat where it is located. Use the SOLR administration page to see the effect:

1, test participle "Nanjing Yangtze River Bridge"

  

Note: In the text box, enter "Nanjing Yangtze River Bridge" click on the right blue button "analyse Values"

Article reproduced, please specify the source: http://www.cnblogs.com/likehua/p/4481219.html

Configuring and using ANSJ participle in SOLR

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.