Windows elasticsearch Chinese Word segmentation configuration

Source: Internet
Author: User

Elasticsearch official only provide SMARTCN this Chinese word breaker, the effect is not very good, fortunately, there are MEDCL in the country (one of the earliest research es) written two Chinese word breaker, one is IK, one is mmseg, the following respectively introduces the use of IK,

When we create an index (library db_news), easticsearch default provided by the word breaker db_news, the word breaker will separate each character, instead of the words we want according to the word. For example:

The code is as follows:

Under normal circumstances, this is not the result we want, such as we prefer "I", "Love", "Beijing", "Tiananmen" such participle, so we need to install Chinese word breaker, IK is to achieve this function.

Installing the IK plugin

Download Elasticsearch Chinese distribution directly. Yes: Https://github.com/medcl/elasticsearch-rtf. rerun the installation of the Elasticsearch instance .

Unzip only the contents of the Plugins folder into the plugins directory

Restart Elasticsearch

After installation, execute the command:

What to note about the word breaker definition

If we create the index library directly, we will use the default word breaker, which is not the result we want. This time we go to change the word breaker will be the following error:

{"Error": "Indexalreadyexistsexception[[db_news] already exists]", "Status": 400}
Get/db_news/_mapping

And there is no way to resolve the conflict, the only way is to delete the existing index, create a new index, and make mapping use the new word breaker ( Note that before the data is inserted, Otherwise, the Elasticsearch default word breaker will be used).

The delete index command is as follows:

Delete/db_news

Ext.: http://eggtwo.com/news/detail/146

Ik:https://github.com/medcl/elasticsearch-analysis-ik

Windows elasticsearch Chinese Word segmentation configuration

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.