Elasticsearch official only provide SMARTCN this Chinese word-breaker, the effect is not very good, fortunately, there are MEDCL great God (one of the earliest research es in China) wrote two Chinese word-breaker, one is IK, one is mmseg, the following describes the use of the two, respectively, Actually all the same, first install the plugin, command line:
To install the IK plug-in:
Plugin-install medcl/elasticsearch-analysis-ik/1.1.0
Download IK related configuration dictionary files to config directory
cd config
wget http://github.com/downloads/medcl/elasticsearch-analysis-ik/ik.zip--no-check-certificate
Unzip Ik.zip
RM ik.zip
To install the MMSEG plugin:
Bin/plugin-install medcl/elasticsearch-analysis-mmseg/1.1.0
Download the related configuration dictionary file to the Config directory
cd config
wget http://github.com/downloads/medcl/elasticsearch-analysis-mmseg/mmseg.zip-- No-check-certificate
unzip Mmseg.zip
RM mmseg.zip
Word Segmentation Configuration
IK participle configuration, added in Elasticsearch.yml file
Index: Analysis
:
Analyzer:
ik:
alias: [Ik_analyzer]
type: Org.elasticsearch.index.analysis.IkAnalyzerProvider
Or
Index.analysis.analyzer.ik.type: "Ik"
These two sentences have the same meaning.
Mmseg word configuration, but also in the Elasticsearch.yml file
Index: Analysis
:
Analyzer:
mmseg:
alias: [News_analyzer, Mmseg_analyzer]
type: Org.elasticsearch.index.analysis.MMsegAnalyzerProvider
Or
Index.analysis.analyzer.default.type: "Mmseg"
Mmseg participle There are some more personalized parameters set as follows
Index: Analysis
:
tokenizer:
mmseg_maxword:
type:mmseg
seg_type: "Max_word"
mmseg_ Complex:
type:mmseg
seg_type: "Complex" mmseg_simple: type:mmseg seg_type
: "Simple"
This configuration completes after the plug-in installation is completed, boot es will load plug-ins.
Define Mapping
You can define a word breaker when you add an index to the mapping
{'
page ': {' properties ': {'
title ': {'
type ': ' String ',
' indexanalyzer ': ' ik ',
' Searchanalyzer ": Ik"
},
"content": {
"type": "string",
"Indexanalyzer": "Ik",
" Searchanalyzer ': ' Ik '}}}
Indexanalyzer is the word breaker used for indexing, Searchanalyzer for use in search.
The Java mapping code is as follows:
Xcontentbuilder content = Xcontentfactory.jsonbuilder (). StartObject ().
startobject ("page").
StartObject (" Properties ")
. StartObject (" title "). Field ("
type "," string ").
field (" Indexanalyzer "," ik ").
field ("Searchanalyzer", "IK")
. EndObject ()
. StartObject ("code")
. Field ("Type", "string")
. Field ("Indexanalyzer", "ik").
Field ( "Searchanalyzer", "IK")
. EndObject (). EndObject (). EndObject ().
EndObject ()
After the definition of the operation index will be the designated word breaker to the word.
Test participle can call the following API, note that IndexName is the index name, casually specify an index on the line
http://localhost:9200/indexname/_analyze?analyzer=ik&text= Test Elasticsearch Word breaker
Report:
IK participle plug-in project address: Https://github.com/medcl/elasticsearch-analysis-ik
MMSEG Word Plug Project address: https://github.com/medcl/elasticsearch-analysis-mmseg
If you feel the configuration trouble, you can also download a configured ES version, the address is as follows: Https://github.com/medcl/elasticsearch-rtf
This article address: http://blog.csdn.net/laigood12345/article/details/7795115
References: http://www.searchtech.pro/articles/2013/02/18/1361190717673.html