Distributed Search Elasticsearch Chinese word segmentation integration

Source: Internet
Author: User

Elasticsearch official only provide SMARTCN this Chinese word-breaker, the effect is not very good, fortunately, there are MEDCL great God (one of the earliest research es in China) wrote two Chinese word-breaker, one is IK, one is mmseg, the following describes the use of the two, respectively, Actually all the same, first install the plugin, command line:
To install the IK plug-in:

Plugin-install medcl/elasticsearch-analysis-ik/1.1.0

Download IK related configuration dictionary files to config directory

cd config
wget http://github.com/downloads/medcl/elasticsearch-analysis-ik/ik.zip--no-check-certificate
Unzip Ik.zip
RM ik.zip

To install the MMSEG plugin:

Bin/plugin-install medcl/elasticsearch-analysis-mmseg/1.1.0

Download the related configuration dictionary file to the Config directory

cd config
wget http://github.com/downloads/medcl/elasticsearch-analysis-mmseg/mmseg.zip-- No-check-certificate
unzip Mmseg.zip
RM mmseg.zip

Word Segmentation Configuration

IK participle configuration, added in Elasticsearch.yml file

Index: Analysis
  :                   
    Analyzer:      
      ik:
          alias: [Ik_analyzer]
          type: Org.elasticsearch.index.analysis.IkAnalyzerProvider

Or

Index.analysis.analyzer.ik.type: "Ik"

These two sentences have the same meaning.
Mmseg word configuration, but also in the Elasticsearch.yml file

Index: Analysis
  :
    Analyzer:
      mmseg:
          alias: [News_analyzer, Mmseg_analyzer]
          type: Org.elasticsearch.index.analysis.MMsegAnalyzerProvider

Or

Index.analysis.analyzer.default.type: "Mmseg"

Mmseg participle There are some more personalized parameters set as follows

Index: Analysis
  :
    tokenizer:
      mmseg_maxword:
          type:mmseg
          seg_type: "Max_word"
      mmseg_ Complex:
          type:mmseg
          seg_type: "Complex" mmseg_simple: type:mmseg seg_type
          : "Simple"

This configuration completes after the plug-in installation is completed, boot es will load plug-ins.

Define Mapping

You can define a word breaker when you add an index to the mapping

{'
   page ': {' properties ': {'
         title ': {'
            type ': ' String ',
            ' indexanalyzer ': ' ik ',
            ' Searchanalyzer ": Ik"
         },
         "content": {
            "type": "string",
            "Indexanalyzer": "Ik",
            " Searchanalyzer ': ' Ik '}}}

Indexanalyzer is the word breaker used for indexing, Searchanalyzer for use in search.

The Java mapping code is as follows:

Xcontentbuilder content = Xcontentfactory.jsonbuilder (). StartObject ().
        startobject ("page").
          StartObject (" Properties ")       
            . StartObject (" title "). Field ("
              type "," string ").           
              field (" Indexanalyzer "," ik ").
              field ("Searchanalyzer", "IK")
            . EndObject () 
            . StartObject ("code")
              . Field ("Type", "string")         
              . Field ("Indexanalyzer", "ik").
              Field ( "Searchanalyzer", "IK")
            . EndObject (). EndObject (). EndObject ().
       EndObject ()

After the definition of the operation index will be the designated word breaker to the word.

Test participle can call the following API, note that IndexName is the index name, casually specify an index on the line
http://localhost:9200/indexname/_analyze?analyzer=ik&text= Test Elasticsearch Word breaker

Report:

IK participle plug-in project address: Https://github.com/medcl/elasticsearch-analysis-ik

MMSEG Word Plug Project address: https://github.com/medcl/elasticsearch-analysis-mmseg

If you feel the configuration trouble, you can also download a configured ES version, the address is as follows: Https://github.com/medcl/elasticsearch-rtf

This article address: http://blog.csdn.net/laigood12345/article/details/7795115
References: http://www.searchtech.pro/articles/2013/02/18/1361190717673.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.