Elasticsearch official only provide SMARTCN this Chinese word breaker, the effect is not very good, fortunately, there are MEDCL in the country (one of the earliest research es) written two Chinese word breaker, one is IK, one is mmseg, the following respectively introduces the use of IK,
When we create an index (library db_news), easticsearch default provided by the word breaker db_news, the word breaker will separate each character, instead of the words we want according to the word. For example:
The code is as follows:
Under normal circumstances, this is not the result we want, such as we prefer "I", "Love", "Beijing", "Tiananmen" such participle, so we need to install Chinese word breaker, IK is to achieve this function.
Installing the IK plugin
Download Elasticsearch Chinese distribution directly. Yes: Https://github.com/medcl/elasticsearch-rtf. rerun the installation of the Elasticsearch instance .
Unzip only the contents of the Plugins folder into the plugins directory
Restart Elasticsearch
After installation, execute the command:
What to note about the word breaker definition
If we create the index library directly, we will use the default word breaker, which is not the result we want. This time we go to change the word breaker will be the following error:
{"Error": "Indexalreadyexistsexception[[db_news] already exists]", "Status": 400}
Get/db_news/_mapping
And there is no way to resolve the conflict, the only way is to delete the existing index, create a new index, and make mapping use the new word breaker ( Note that before the data is inserted, Otherwise, the Elasticsearch default word breaker will be used).
The delete index command is as follows:
Delete/db_news
Ext.: http://eggtwo.com/news/detail/146
Ik:https://github.com/medcl/elasticsearch-analysis-ik
Windows elasticsearch Chinese Word segmentation configuration